How to Train

Complete guide to training EDR evasion models with malagent

This guide walks you through training an EDR evasion model from scratch. Start with the quick start for immediate results, then explore advanced sections for optimization.

TL;DR - Quick Start (10 minutes)

Already have a Windows DEVBOX with MSVC? Run training immediately:

# 1. Enter the toolbox
toolbox enter malagent

# 2. Configure Windows connection
malagent setup --minimal

# 3. Test connection
malagent test --level smoke

# 4. Start training (MVR mode)
malagent raft train \
    --mode mvr \
    --prompts malagent/data/prompts/mvr_prompt_v2.jsonl \
    --cycles 6

That’s it. Training will begin and produce checkpoints as it progresses.


Prerequisites Checklist

Before training, ensure you have:

Hardware

ComponentMinimumRecommended
GPU24GB VRAM48GB+ (Strix Halo)
RAM32GB64GB+
Storage100GB SSD500GB NVMe
NetworkStable connection to Windows serverSame subnet

Software (Training Host)

  • Linux (Fedora 41+ recommended)
  • Podman/Docker for toolbox
  • ROCm or CUDA drivers

Infrastructure

ModeRequirements
MVR (Minimal)Windows DEVBOX with MSVC
Elastic+ Elastic Security deployment
Full+ Proxmox with VM pool

Part 1: Setup

Option A: Minimal Setup (MVR Mode)

MVR (Minimum Viable Reward) mode uses compilation success as the reward signal. This is the fastest way to start.

1. Prepare Windows DEVBOX

On your Windows machine:

# Enable OpenSSH Server
Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0
Start-Service sshd
Set-Service -Name sshd -StartupType Automatic

# Install Visual Studio Build Tools
# Download from: https://visualstudio.microsoft.com/downloads/
# Select "Desktop development with C++"

2. Configure SSH Key

On your training host:

# Generate key if needed
ssh-keygen -t ed25519 -f ~/.ssh/win

# Copy to Windows
ssh-copy-id -i ~/.ssh/win.pub user@windows-ip

# Test connection
ssh -i ~/.ssh/win user@windows-ip "echo Hello from Windows"

3. Run Setup Wizard

malagent setup --minimal

The wizard will:

  1. Test SSH connection
  2. Verify MSVC is available
  3. Create configuration files

4. Verify Setup

malagent test --level smoke

Expected output:

Testing Windows DEVBOX connection...    ✓
Testing MSVC compilation...             ✓
Testing code extraction...              ✓

All smoke tests passed!

Option B: Standard Setup (Elastic Mode)

Elastic mode provides graduated rewards based on EDR detection severity.

1. Complete MVR Setup (above)

2. Deploy Elastic Security

You need:

  • Elasticsearch cluster
  • Kibana
  • Elastic Agent with Defend integration on test VMs

3. Configure Elastic Connection

malagent setup  # Select "Standard" mode

Or manually create configs/elastic_verifier.yaml:

elastic:
  host: "10.0.20.145"
  port: 9200
  username: "elastic"
  password: "your_password"
  ssl_verify: false

  detection:
    timeout: 120
    poll_interval: 5

4. Verify Elastic Connection

malagent elastic rules

Option C: Full Setup (Proxmox Orchestration)

Full mode adds VM pool management for parallel sample execution.

1. Complete Elastic Setup (above)

2. Create VM Template

In Proxmox:

  • Create Windows 10/11 VM
  • Install Elastic Agent
  • Convert to template

3. Configure Proxmox

malagent setup --full

Or create configs/proxmox.yaml:

proxmox:
  host: "proxmox.local"
  user: "root@pam"
  token_name: "malagent"
  token_value: "your-token"
  
  vm_pool:
    template_id: 100
    pool_size: 4
    snapshot_name: "clean"

Part 2: Your First Training Run

Understanding the RAFT Cycle

RAFT (Reward-rAnked Fine-Tuning) works in cycles:

┌─────────────────────────────────────────────────┐
│               RAFT CYCLE                         │
├─────────────────────────────────────────────────┤
│                                                  │
│  GENERATE ──► VERIFY ──► FILTER ──► TRAIN       │
│      │           │          │          │        │
│      ▼           ▼          ▼          ▼        │
│  8 samples   Compile    Keep top    Fine-tune   │
│  per prompt  + Detect   by reward   on winners  │
│                                                  │
│  ◄─────────── REPEAT 6 TIMES ────────────────►  │
└─────────────────────────────────────────────────┘

Running MVR Training

malagent raft train \
    --mode mvr \
    --prompts malagent/data/prompts/mvr_prompt_v2.jsonl \
    --cycles 6 \
    --samples-per-prompt 8

Interpreting Results

Watch the training output:

RAFT CYCLE 1/6
==============
Generating samples... 569 prompts × 8 samples
Verifying 4552 samples...
  Compiled: 1823 (40.1%)
  Failed: 2729

Filtering samples...
  Kept: 912 samples (reward >= 0.5)

Training on filtered samples...
  Loss: 0.856 → 0.342

Saving checkpoint to output/raft/cycle_1/

Key Metrics to Watch:

  • Compile rate: Higher is better - indicates model is generating valid code
  • Loss decrease: Should trend downward across cycles
  • Kept samples: More samples = more training signal

When to Stop

Monitor compile rate and loss across cycles. Training may plateau or degrade after several cycles.

General guidance:

  • Stop when compile rate decreases for 2 consecutive cycles
  • Stop when loss starts increasing
  • The optimal number of cycles varies by dataset and model

Part 3: Multi-Language Training

C++ with MSVC (Default)

malagent raft train --mode mvr --language msvc

C++ with MinGW (Local)

No Windows server needed:

# Install MinGW
sudo dnf install mingw64-gcc-c++

# Train
malagent raft train --mode mvr --language mingw

Rust

# Install target (or let malagent do it)
rustup target add x86_64-pc-windows-gnu

# Train
malagent raft train --mode mvr --language rust

Go

malagent raft train --mode mvr --language go

C#/.NET

# Install .NET SDK
sudo dnf install dotnet-sdk-8.0

# Train
malagent raft train --mode mvr --language dotnet

PowerShell

malagent raft train --mode mvr --language powershell

Part 4: Advanced Training

Elastic Mode (Detection Rewards)

Once Elastic is configured:

malagent raft train \
    --mode elastic \
    --config configs/elastic_verifier.yaml \
    --cycles 6

Graduated rewards:

DetectionRewardTraining Signal
Critical0.5Keep, but low priority
High0.6Keep
Medium0.7Keep
Low0.8Good
Evaded1.0Best

Using Distillation

Generate high-quality training data with external LLMs:

# Generate samples with Claude
malagent distill run \
    --provider anthropic \
    --model claude-sonnet-4-20250514 \
    --prompts data/prompts/techniques.jsonl \
    --output distillation_output \
    --budget 20.0

# Export verified samples
malagent distill export \
    --samples-dir distillation_output/samples \
    --output sft_data.jsonl \
    --min-reward 0.5

# Train on distilled data
malagent sft train --dataset sft_data.jsonl

Fix Agent Integration

Automatically attempt to fix compilation failures using an external LLM:

malagent raft train \
    --mode mvr \
    --fix-agent \
    --fix-provider anthropic \
    --fix-model claude-sonnet-4-20250514 \
    --max-fix-attempts 2

Note: This incurs external API costs. Results vary depending on error types.

Custom Prompts

Create your own prompt dataset:

{"prompt": "Implement direct syscall for NtAllocateVirtualMemory in C++"}
{"prompt": "Write C++ code that enumerates loaded modules via PEB"}
{"prompt": "Implement APC injection technique in C++"}
malagent raft train --prompts my_prompts.jsonl

Part 5: Optimizing Training

Using Lint Pre-Checks

Enable fast rejection of obviously broken code:

# In raft_config.yaml
verifier:
  enable_lint: true

This saves compilation time by catching:

  • Missing includes
  • Unbalanced braces
  • Missing main function

Using Metrics

Track error patterns for dataset refinement:

verifier:
  track_metrics: true
  log_failed_samples: true

After training:

malagent info --metrics

Binary Caching

Save compiled binaries for later analysis:

verifier:
  binary_cache_dir: "./binary_cache"

Hyperparameter Tuning

ParameterDefaultTuning Notes
samples_per_prompt8More = better diversity, slower
temperature0.7Higher = more diverse, lower quality
reward_threshold0.5Higher = stricter filtering
learning_rate5e-5Lower if loss unstable
cycles6Stop when compile rate drops

Memory Optimization (Strix Halo)

For unified memory systems:

training:
  batch_size: 2
  gradient_accumulation: 16
  bf16: true
  gradient_checkpointing: true

Troubleshooting

Low Compile Rate

Symptoms: <20% compile rate, many format failures

Solutions:

  1. Check prompt quality - are they asking for complete code?
  2. Lower temperature for more consistent output
  3. Add few-shot examples to prompts
  4. Use SFT baseline before RAFT

Training Loss Increasing

Symptoms: Loss goes up after cycle 4-5

Solutions:

  1. Stop training - you’ve peaked
  2. Lower learning rate
  3. Increase reward_threshold to filter stricter

SSH Connection Failures

Symptoms: Intermittent verification failures

Solutions:

  1. Use SSH key auth (not password)
  2. Increase timeout.connection
  3. Check Windows firewall
  4. Test: ssh -v user@host

Out of Memory

Symptoms: CUDA/ROCm OOM errors

Solutions:

  1. Reduce batch_size
  2. Enable gradient_checkpointing
  3. Use smaller model
  4. For Strix Halo: check unified memory limits

See Troubleshooting for more solutions.


Next Steps