How to Train

Complete guide to training EDR evasion models with malagent

This guide walks you through training an EDR evasion model from scratch. Start with the quick start for immediate results, then explore advanced sections for optimization.

TL;DR - Quick Start (10 minutes)

Already have a Windows DEVBOX with MSVC? Run training immediately:

# 1. Enter the toolbox
toolbox enter malagent

# 2. Configure Windows connection
malagent setup --minimal

# 3. Test connection
malagent test --level smoke

# 4. Start training (MVR mode)
malagent raft train \
    --mode mvr \
    --prompts malagent/data/prompts/mvr_prompt_v2.jsonl \
    --cycles 6

That’s it. Training will begin and produce checkpoints as it progresses.

Prerequisites Checklist

Before training, ensure you have:

Hardware

Component	Minimum	Recommended
GPU	24GB VRAM	48GB+ (Strix Halo)
RAM	32GB	64GB+
Storage	100GB SSD	500GB NVMe
Network	Stable connection to Windows server	Same subnet

Software (Training Host)

Linux (Fedora 41+ recommended)
Podman/Docker for toolbox
ROCm or CUDA drivers

Infrastructure

Mode	Requirements
MVR (Minimal)	Windows DEVBOX with MSVC
Elastic	+ Elastic Security deployment
Full	+ Proxmox with VM pool

Part 1: Setup

Option A: Minimal Setup (MVR Mode)

MVR (Minimum Viable Reward) mode uses compilation success as the reward signal. This is the fastest way to start.

1. Prepare Windows DEVBOX

On your Windows machine:

# Enable OpenSSH Server
Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0
Start-Service sshd
Set-Service -Name sshd -StartupType Automatic

# Install Visual Studio Build Tools
# Download from: https://visualstudio.microsoft.com/downloads/
# Select "Desktop development with C++"

2. Configure SSH Key

On your training host:

# Generate key if needed
ssh-keygen -t ed25519 -f ~/.ssh/win

# Copy to Windows
ssh-copy-id -i ~/.ssh/win.pub user@windows-ip

# Test connection
ssh -i ~/.ssh/win user@windows-ip "echo Hello from Windows"

3. Run Setup Wizard

malagent setup --minimal

The wizard will:

Test SSH connection
Verify MSVC is available
Create configuration files

4. Verify Setup

malagent test --level smoke

Expected output:

Testing Windows DEVBOX connection...    ✓
Testing MSVC compilation...             ✓
Testing code extraction...              ✓

All smoke tests passed!

Option B: Standard Setup (Elastic Mode)

Elastic mode provides graduated rewards based on EDR detection severity.

1. Complete MVR Setup (above)

2. Deploy Elastic Security

You need:

Elasticsearch cluster
Kibana
Elastic Agent with Defend integration on test VMs

3. Configure Elastic Connection

malagent setup  # Select "Standard" mode

Or manually create configs/elastic_verifier.yaml:

elastic:
  host: "10.0.20.145"
  port: 9200
  username: "elastic"
  password: "your_password"
  ssl_verify: false

  detection:
    timeout: 120
    poll_interval: 5

4. Verify Elastic Connection

malagent elastic rules

Option C: Full Setup (Proxmox Orchestration)

Full mode adds VM pool management for parallel sample execution.

1. Complete Elastic Setup (above)

2. Create VM Template

In Proxmox:

Create Windows 10/11 VM
Install Elastic Agent
Convert to template

3. Configure Proxmox

malagent setup --full

Or create configs/proxmox.yaml:

proxmox:
  host: "proxmox.local"
  user: "root@pam"
  token_name: "malagent"
  token_value: "your-token"
  
  vm_pool:
    template_id: 100
    pool_size: 4
    snapshot_name: "clean"

Part 2: Your First Training Run

Understanding the RAFT Cycle

RAFT (Reward-rAnked Fine-Tuning) works in cycles:

┌─────────────────────────────────────────────────┐
│               RAFT CYCLE                         │
├─────────────────────────────────────────────────┤
│                                                  │
│  GENERATE ──► VERIFY ──► FILTER ──► TRAIN       │
│      │           │          │          │        │
│      ▼           ▼          ▼          ▼        │
│  8 samples   Compile    Keep top    Fine-tune   │
│  per prompt  + Detect   by reward   on winners  │
│                                                  │
│  ◄─────────── REPEAT 6 TIMES ────────────────►  │
└─────────────────────────────────────────────────┘

Running MVR Training

malagent raft train \
    --mode mvr \
    --prompts malagent/data/prompts/mvr_prompt_v2.jsonl \
    --cycles 6 \
    --samples-per-prompt 8

Interpreting Results

Watch the training output:

RAFT CYCLE 1/6
==============
Generating samples... 569 prompts × 8 samples
Verifying 4552 samples...
  Compiled: 1823 (40.1%)
  Failed: 2729

Filtering samples...
  Kept: 912 samples (reward >= 0.5)

Training on filtered samples...
  Loss: 0.856 → 0.342

Saving checkpoint to output/raft/cycle_1/

Key Metrics to Watch:

Compile rate: Higher is better - indicates model is generating valid code
Loss decrease: Should trend downward across cycles
Kept samples: More samples = more training signal

When to Stop

Monitor compile rate and loss across cycles. Training may plateau or degrade after several cycles.

General guidance:

Stop when compile rate decreases for 2 consecutive cycles
Stop when loss starts increasing
The optimal number of cycles varies by dataset and model

Part 3: Multi-Language Training

C++ with MSVC (Default)

malagent raft train --mode mvr --language msvc

C++ with MinGW (Local)

No Windows server needed:

# Install MinGW
sudo dnf install mingw64-gcc-c++

# Train
malagent raft train --mode mvr --language mingw

Rust

# Install target (or let malagent do it)
rustup target add x86_64-pc-windows-gnu

# Train
malagent raft train --mode mvr --language rust

Go

malagent raft train --mode mvr --language go

C#/.NET

# Install .NET SDK
sudo dnf install dotnet-sdk-8.0

# Train
malagent raft train --mode mvr --language dotnet

PowerShell

malagent raft train --mode mvr --language powershell

Part 4: Advanced Training

Elastic Mode (Detection Rewards)

Once Elastic is configured:

malagent raft train \
    --mode elastic \
    --config configs/elastic_verifier.yaml \
    --cycles 6

Graduated rewards:

Detection	Reward	Training Signal
Critical	0.5	Keep, but low priority
High	0.6	Keep
Medium	0.7	Keep
Low	0.8	Good
Evaded	1.0	Best

Using Distillation

Generate high-quality training data with external LLMs:

# Generate samples with Claude
malagent distill run \
    --provider anthropic \
    --model claude-sonnet-4-20250514 \
    --prompts data/prompts/techniques.jsonl \
    --output distillation_output \
    --budget 20.0

# Export verified samples
malagent distill export \
    --samples-dir distillation_output/samples \
    --output sft_data.jsonl \
    --min-reward 0.5

# Train on distilled data
malagent sft train --dataset sft_data.jsonl

Fix Agent Integration

Automatically attempt to fix compilation failures using an external LLM:

malagent raft train \
    --mode mvr \
    --fix-agent \
    --fix-provider anthropic \
    --fix-model claude-sonnet-4-20250514 \
    --max-fix-attempts 2

Note: This incurs external API costs. Results vary depending on error types.

Custom Prompts

Create your own prompt dataset:

{"prompt": "Implement direct syscall for NtAllocateVirtualMemory in C++"}
{"prompt": "Write C++ code that enumerates loaded modules via PEB"}
{"prompt": "Implement APC injection technique in C++"}

malagent raft train --prompts my_prompts.jsonl

Part 5: Optimizing Training

Using Lint Pre-Checks

Enable fast rejection of obviously broken code:

# In raft_config.yaml
verifier:
  enable_lint: true

This saves compilation time by catching:

Missing includes
Unbalanced braces
Missing main function

Using Metrics

Track error patterns for dataset refinement:

verifier:
  track_metrics: true
  log_failed_samples: true

After training:

malagent info --metrics

Binary Caching

Save compiled binaries for later analysis:

verifier:
  binary_cache_dir: "./binary_cache"

Hyperparameter Tuning

Parameter	Default	Tuning Notes
`samples_per_prompt`	8	More = better diversity, slower
`temperature`	0.7	Higher = more diverse, lower quality
`reward_threshold`	0.5	Higher = stricter filtering
`learning_rate`	5e-5	Lower if loss unstable
`cycles`	6	Stop when compile rate drops

Memory Optimization (Strix Halo)

For unified memory systems:

training:
  batch_size: 2
  gradient_accumulation: 16
  bf16: true
  gradient_checkpointing: true

Troubleshooting

Low Compile Rate

Symptoms: <20% compile rate, many format failures

Solutions:

Check prompt quality - are they asking for complete code?
Lower temperature for more consistent output
Add few-shot examples to prompts
Use SFT baseline before RAFT

Training Loss Increasing

Symptoms: Loss goes up after cycle 4-5

Solutions:

Stop training - you’ve peaked
Lower learning rate
Increase reward_threshold to filter stricter

SSH Connection Failures

Symptoms: Intermittent verification failures

Solutions:

Use SSH key auth (not password)
Increase timeout.connection
Check Windows firewall
Test: ssh -v user@host

Out of Memory

Symptoms: CUDA/ROCm OOM errors

Solutions:

Reduce batch_size
Enable gradient_checkpointing
Use smaller model
For Strix Halo: check unified memory limits

See Troubleshooting for more solutions.

Next Steps

Research Findings - See training results and analysis
Techniques Reference - Understand the techniques
Distillation Pipeline - Generate data with external LLMs
Verifier Reference - Deep dive into verification