IcePorge-Malware-RAG

View on GitHub


AI-Powered Malware Analysis with Retrieval Augmented Generation

Part of the IcePorge Malware Analysis Stack.

License: MIT


Overview

Malware RAG provides context-enriched AI responses for malware analysis by combining a curated knowledge base (SANS FOR610/SEC504) with local LLM inference via Ollama. It uses ChromaDB for vector storage and semantic search.

Key Features

  • RAG Pipeline - Retrieval Augmented Generation for accurate responses
  • FOR610 Knowledge Base - SANS malware analysis course material
  • Local LLM - Privacy-focused inference with Ollama (no cloud APIs)
  • Vector Search - Semantic similarity with ChromaDB
  • REST API - Easy integration with analysis tools

Architecture

    +--------------------------------------------------+
    |              Malware RAG (ki01)                  |
    |                                                  |
    |  +--------------------------------------------+  |
    |  |              rag_api.py                    |  |
    |  |          Flask REST API (:5001)            |  |
    |  +--------------------------------------------+  |
    |         |                        |               |
    |         v                        v               |
    |  +-------------+          +-------------+        |
    |  |  ChromaDB   |          |   Ollama    |        |
    |  | (Vectors)   |          | (qwen2.5)   |        |
    |  +-------------+          +-------------+        |
    |         ^                                        |
    |         |                                        |
    |  +-------------+                                 |
    |  | FOR610/SEC  |                                 |
    |  | 504 PDFs    |                                 |
    |  +-------------+                                 |
    +--------------------------------------------------+

    Query Flow:
    1. User query received
    2. Query embedded via SentenceTransformer
    3. Similar chunks retrieved from ChromaDB
    4. Context + Query sent to Ollama
    5. Enriched response returned

Components

File Description
rag_api.py Flask REST API with RAG pipeline
ingest_documents.py PDF ingestion and chunking

Installation

Prerequisites

  • Python 3.10+
  • Ollama with qwen2.5-coder:14b model
  • 16GB+ RAM (for embeddings and LLM)

Setup

# Clone repository
git clone https://github.com/icepaule/IcePorge-Malware-RAG.git
cd IcePorge-Malware-RAG

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install flask flask-cors chromadb sentence-transformers requests langchain langchain-community pypdf

# Ingest documents (place PDFs in documents/ folder)
python ingest_documents.py /path/to/FOR610.pdf /path/to/SEC504.pdf

# Start API server
python rag_api.py

API Endpoints

Health Check

curl http://localhost:5001/health

Response:

{
  "status": "healthy",
  "service": "malware-rag",
  "collections": ["for610"],
  "ollama_available": true
}

Query with RAG Context

curl -X POST http://localhost:5001/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How do I identify process injection techniques?", "collection": "for610"}'

Response:

{
  "response": "Process injection can be identified by...",
  "sources": [
    {"content": "...", "source": "FOR610-book1.pdf", "page": 45, "relevance": 0.89}
  ],
  "model": "qwen2.5-coder:14b"
}

Direct LLM Query (no RAG)

curl -X POST http://localhost:5001/llm \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain RC4 encryption in malware"}'

Document Ingestion

Supported Formats

  • PDF (via PyPDF)
  • Text files

Ingestion Process

# Single document
python ingest_documents.py /path/to/document.pdf

# Multiple documents
python ingest_documents.py /path/to/doc1.pdf /path/to/doc2.pdf

# Custom collection name
python ingest_documents.py --collection sec504 /path/to/SEC504.pdf

Chunking Configuration

Parameter Default Description
chunk_size 1000 Characters per chunk
chunk_overlap 200 Overlap between chunks
embedding_model all-MiniLM-L6-v2 SentenceTransformer model

Configuration

Environment Variables

Variable Default Description
CHROMA_PATH /opt/malware-rag/chroma_db ChromaDB storage path
OLLAMA_URL http://localhost:11434 Ollama API endpoint
DEFAULT_MODEL qwen2.5-coder:14b Default LLM model

Integration Examples

Python Client

import requests

def analyze_with_context(query):
    response = requests.post(
        "http://ki01:5001/query",
        json={"query": query, "collection": "for610"}
    )
    return response.json()

# Example: Get context-aware analysis
result = analyze_with_context("What APIs are commonly used for keylogging?")
print(result["response"])

Command Line

# Quick query
curl -s http://ki01:5001/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain API hashing"}' | jq .response

Knowledge Base Collections

Collection Content Documents
for610 SANS FOR610 - Reverse Engineering Malware Course PDFs
sec504 SANS SEC504 - Incident Response Course PDFs

Service Management

# Start as systemd service
sudo systemctl start malware-rag

# View logs
sudo journalctl -u malware-rag -f

# Check status
sudo systemctl status malware-rag

Performance Tuning

GPU Acceleration

For faster LLM inference, ensure Ollama uses GPU:

# Check GPU usage
nvidia-smi

# Ollama automatically uses GPU if available
ollama run qwen2.5-coder:14b

Memory Optimization

  • Reduce n_results in queries for less context
  • Use smaller embedding model for faster search
  • Adjust num_ctx in Ollama for context window size

License

MIT License - See LICENSE

Author: Michael Pauli