IcePorge-Malware-RAG

AI-Powered Malware Analysis with Retrieval Augmented Generation

Part of the IcePorge Malware Analysis Stack.

Overview

Malware RAG provides context-enriched AI responses for malware analysis by combining a curated knowledge base (SANS FOR610/SEC504) with local LLM inference via Ollama. It uses ChromaDB for vector storage and semantic search.

Key Features

RAG Pipeline - Retrieval Augmented Generation for accurate responses
FOR610 Knowledge Base - SANS malware analysis course material
Local LLM - Privacy-focused inference with Ollama (no cloud APIs)
Vector Search - Semantic similarity with ChromaDB
REST API - Easy integration with analysis tools

Architecture

    +--------------------------------------------------+
    |              Malware RAG (ki01)                  |
    |                                                  |
    |  +--------------------------------------------+  |
    |  |              rag_api.py                    |  |
    |  |          Flask REST API (:5001)            |  |
    |  +--------------------------------------------+  |
    |         |                        |               |
    |         v                        v               |
    |  +-------------+          +-------------+        |
    |  |  ChromaDB   |          |   Ollama    |        |
    |  | (Vectors)   |          | (qwen2.5)   |        |
    |  +-------------+          +-------------+        |
    |         ^                                        |
    |         |                                        |
    |  +-------------+                                 |
    |  | FOR610/SEC  |                                 |
    |  | 504 PDFs    |                                 |
    |  +-------------+                                 |
    +--------------------------------------------------+

    Query Flow:
    1. User query received
    2. Query embedded via SentenceTransformer
    3. Similar chunks retrieved from ChromaDB
    4. Context + Query sent to Ollama
    5. Enriched response returned

Components

File	Description
`rag_api.py`	Flask REST API with RAG pipeline
`ingest_documents.py`	PDF ingestion and chunking

Installation

Prerequisites

Python 3.10+
Ollama with qwen2.5-coder:14b model
16GB+ RAM (for embeddings and LLM)

Setup

# Clone repository
git clone https://github.com/icepaule/IcePorge-Malware-RAG.git
cd IcePorge-Malware-RAG

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install flask flask-cors chromadb sentence-transformers requests langchain langchain-community pypdf

# Ingest documents (place PDFs in documents/ folder)
python ingest_documents.py /path/to/FOR610.pdf /path/to/SEC504.pdf

# Start API server
python rag_api.py

API Endpoints

Health Check

curl http://localhost:5001/health

Response:

{
  "status": "healthy",
  "service": "malware-rag",
  "collections": ["for610"],
  "ollama_available": true
}

Query with RAG Context

curl -X POST http://localhost:5001/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How do I identify process injection techniques?", "collection": "for610"}'

Response:

{
  "response": "Process injection can be identified by...",
  "sources": [
    {"content": "...", "source": "FOR610-book1.pdf", "page": 45, "relevance": 0.89}
  ],
  "model": "qwen2.5-coder:14b"
}

Direct LLM Query (no RAG)

curl -X POST http://localhost:5001/llm \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain RC4 encryption in malware"}'

Document Ingestion

Supported Formats

PDF (via PyPDF)
Text files

Ingestion Process

# Single document
python ingest_documents.py /path/to/document.pdf

# Multiple documents
python ingest_documents.py /path/to/doc1.pdf /path/to/doc2.pdf

# Custom collection name
python ingest_documents.py --collection sec504 /path/to/SEC504.pdf

Chunking Configuration

Parameter	Default	Description
`chunk_size`	1000	Characters per chunk
`chunk_overlap`	200	Overlap between chunks
`embedding_model`	`all-MiniLM-L6-v2`	SentenceTransformer model

Configuration

Environment Variables

Variable	Default	Description
`CHROMA_PATH`	`/opt/malware-rag/chroma_db`	ChromaDB storage path
`OLLAMA_URL`	`http://localhost:11434`	Ollama API endpoint
`DEFAULT_MODEL`	`qwen2.5-coder:14b`	Default LLM model

Integration Examples

Python Client

import requests

def analyze_with_context(query):
    response = requests.post(
        "http://ki01:5001/query",
        json={"query": query, "collection": "for610"}
    )
    return response.json()

# Example: Get context-aware analysis
result = analyze_with_context("What APIs are commonly used for keylogging?")
print(result["response"])

Command Line

# Quick query
curl -s http://ki01:5001/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain API hashing"}' | jq .response

Knowledge Base Collections

Collection	Content	Documents
`for610`	SANS FOR610 - Reverse Engineering Malware	Course PDFs
`sec504`	SANS SEC504 - Incident Response	Course PDFs

Service Management

# Start as systemd service
sudo systemctl start malware-rag

# View logs
sudo journalctl -u malware-rag -f

# Check status
sudo systemctl status malware-rag

Performance Tuning

GPU Acceleration

For faster LLM inference, ensure Ollama uses GPU:

# Check GPU usage
nvidia-smi

# Ollama automatically uses GPU if available
ollama run qwen2.5-coder:14b

Memory Optimization

Reduce n_results in queries for less context
Use smaller embedding model for faster search
Adjust num_ctx in Ollama for context window size

License

MIT License - See LICENSE

Author: Michael Pauli

GitHub: @icepaule
Email: info@mpauli.de