How to Train
Complete guide to training EDR evasion models with malagent
This guide walks you through training an EDR evasion model from scratch. Start with the quick start for immediate results, then explore advanced sections for optimization.
TL;DR - Quick Start (10 minutes)
Already have a Windows DEVBOX with MSVC? Run training immediately:
# 1. Enter the toolbox
toolbox enter malagent
# 2. Configure Windows connection
malagent setup --minimal
# 3. Test connection
malagent test --level smoke
# 4. Start training (MVR mode)
malagent raft train \
--mode mvr \
--prompts malagent/data/prompts/mvr_prompt_v2.jsonl \
--cycles 6
That’s it. Training will begin and produce checkpoints as it progresses.
Prerequisites Checklist
Before training, ensure you have:
Hardware
| Component | Minimum | Recommended |
|---|---|---|
| GPU | 24GB VRAM | 48GB+ (Strix Halo) |
| RAM | 32GB | 64GB+ |
| Storage | 100GB SSD | 500GB NVMe |
| Network | Stable connection to Windows server | Same subnet |
Software (Training Host)
- Linux (Fedora 41+ recommended)
- Podman/Docker for toolbox
- ROCm or CUDA drivers
Infrastructure
| Mode | Requirements |
|---|---|
| MVR (Minimal) | Windows DEVBOX with MSVC |
| Elastic | + Elastic Security deployment |
| Full | + Proxmox with VM pool |
Part 1: Setup
Option A: Minimal Setup (MVR Mode)
MVR (Minimum Viable Reward) mode uses compilation success as the reward signal. This is the fastest way to start.
1. Prepare Windows DEVBOX
On your Windows machine:
# Enable OpenSSH Server
Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0
Start-Service sshd
Set-Service -Name sshd -StartupType Automatic
# Install Visual Studio Build Tools
# Download from: https://visualstudio.microsoft.com/downloads/
# Select "Desktop development with C++"
2. Configure SSH Key
On your training host:
# Generate key if needed
ssh-keygen -t ed25519 -f ~/.ssh/win
# Copy to Windows
ssh-copy-id -i ~/.ssh/win.pub user@windows-ip
# Test connection
ssh -i ~/.ssh/win user@windows-ip "echo Hello from Windows"
3. Run Setup Wizard
malagent setup --minimal
The wizard will:
- Test SSH connection
- Verify MSVC is available
- Create configuration files
4. Verify Setup
malagent test --level smoke
Expected output:
Testing Windows DEVBOX connection... ✓
Testing MSVC compilation... ✓
Testing code extraction... ✓
All smoke tests passed!
Option B: Standard Setup (Elastic Mode)
Elastic mode provides graduated rewards based on EDR detection severity.
1. Complete MVR Setup (above)
2. Deploy Elastic Security
You need:
- Elasticsearch cluster
- Kibana
- Elastic Agent with Defend integration on test VMs
3. Configure Elastic Connection
malagent setup # Select "Standard" mode
Or manually create configs/elastic_verifier.yaml:
elastic:
host: "10.0.20.145"
port: 9200
username: "elastic"
password: "your_password"
ssl_verify: false
detection:
timeout: 120
poll_interval: 5
4. Verify Elastic Connection
malagent elastic rules
Option C: Full Setup (Proxmox Orchestration)
Full mode adds VM pool management for parallel sample execution.
1. Complete Elastic Setup (above)
2. Create VM Template
In Proxmox:
- Create Windows 10/11 VM
- Install Elastic Agent
- Convert to template
3. Configure Proxmox
malagent setup --full
Or create configs/proxmox.yaml:
proxmox:
host: "proxmox.local"
user: "root@pam"
token_name: "malagent"
token_value: "your-token"
vm_pool:
template_id: 100
pool_size: 4
snapshot_name: "clean"
Part 2: Your First Training Run
Understanding the RAFT Cycle
RAFT (Reward-rAnked Fine-Tuning) works in cycles:
┌─────────────────────────────────────────────────┐
│ RAFT CYCLE │
├─────────────────────────────────────────────────┤
│ │
│ GENERATE ──► VERIFY ──► FILTER ──► TRAIN │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ 8 samples Compile Keep top Fine-tune │
│ per prompt + Detect by reward on winners │
│ │
│ ◄─────────── REPEAT 6 TIMES ────────────────► │
└─────────────────────────────────────────────────┘
Running MVR Training
malagent raft train \
--mode mvr \
--prompts malagent/data/prompts/mvr_prompt_v2.jsonl \
--cycles 6 \
--samples-per-prompt 8
Interpreting Results
Watch the training output:
RAFT CYCLE 1/6
==============
Generating samples... 569 prompts × 8 samples
Verifying 4552 samples...
Compiled: 1823 (40.1%)
Failed: 2729
Filtering samples...
Kept: 912 samples (reward >= 0.5)
Training on filtered samples...
Loss: 0.856 → 0.342
Saving checkpoint to output/raft/cycle_1/
Key Metrics to Watch:
- Compile rate: Higher is better - indicates model is generating valid code
- Loss decrease: Should trend downward across cycles
- Kept samples: More samples = more training signal
When to Stop
Monitor compile rate and loss across cycles. Training may plateau or degrade after several cycles.
General guidance:
- Stop when compile rate decreases for 2 consecutive cycles
- Stop when loss starts increasing
- The optimal number of cycles varies by dataset and model
Part 3: Multi-Language Training
C++ with MSVC (Default)
malagent raft train --mode mvr --language msvc
C++ with MinGW (Local)
No Windows server needed:
# Install MinGW
sudo dnf install mingw64-gcc-c++
# Train
malagent raft train --mode mvr --language mingw
Rust
# Install target (or let malagent do it)
rustup target add x86_64-pc-windows-gnu
# Train
malagent raft train --mode mvr --language rust
Go
malagent raft train --mode mvr --language go
C#/.NET
# Install .NET SDK
sudo dnf install dotnet-sdk-8.0
# Train
malagent raft train --mode mvr --language dotnet
PowerShell
malagent raft train --mode mvr --language powershell
Part 4: Advanced Training
Elastic Mode (Detection Rewards)
Once Elastic is configured:
malagent raft train \
--mode elastic \
--config configs/elastic_verifier.yaml \
--cycles 6
Graduated rewards:
| Detection | Reward | Training Signal |
|---|---|---|
| Critical | 0.5 | Keep, but low priority |
| High | 0.6 | Keep |
| Medium | 0.7 | Keep |
| Low | 0.8 | Good |
| Evaded | 1.0 | Best |
Using Distillation
Generate high-quality training data with external LLMs:
# Generate samples with Claude
malagent distill run \
--provider anthropic \
--model claude-sonnet-4-20250514 \
--prompts data/prompts/techniques.jsonl \
--output distillation_output \
--budget 20.0
# Export verified samples
malagent distill export \
--samples-dir distillation_output/samples \
--output sft_data.jsonl \
--min-reward 0.5
# Train on distilled data
malagent sft train --dataset sft_data.jsonl
Fix Agent Integration
Automatically attempt to fix compilation failures using an external LLM:
malagent raft train \
--mode mvr \
--fix-agent \
--fix-provider anthropic \
--fix-model claude-sonnet-4-20250514 \
--max-fix-attempts 2
Note: This incurs external API costs. Results vary depending on error types.
Custom Prompts
Create your own prompt dataset:
{"prompt": "Implement direct syscall for NtAllocateVirtualMemory in C++"}
{"prompt": "Write C++ code that enumerates loaded modules via PEB"}
{"prompt": "Implement APC injection technique in C++"}
malagent raft train --prompts my_prompts.jsonl
Part 5: Optimizing Training
Using Lint Pre-Checks
Enable fast rejection of obviously broken code:
# In raft_config.yaml
verifier:
enable_lint: true
This saves compilation time by catching:
- Missing includes
- Unbalanced braces
- Missing main function
Using Metrics
Track error patterns for dataset refinement:
verifier:
track_metrics: true
log_failed_samples: true
After training:
malagent info --metrics
Binary Caching
Save compiled binaries for later analysis:
verifier:
binary_cache_dir: "./binary_cache"
Hyperparameter Tuning
| Parameter | Default | Tuning Notes |
|---|---|---|
samples_per_prompt | 8 | More = better diversity, slower |
temperature | 0.7 | Higher = more diverse, lower quality |
reward_threshold | 0.5 | Higher = stricter filtering |
learning_rate | 5e-5 | Lower if loss unstable |
cycles | 6 | Stop when compile rate drops |
Memory Optimization (Strix Halo)
For unified memory systems:
training:
batch_size: 2
gradient_accumulation: 16
bf16: true
gradient_checkpointing: true
Troubleshooting
Low Compile Rate
Symptoms: <20% compile rate, many format failures
Solutions:
- Check prompt quality - are they asking for complete code?
- Lower temperature for more consistent output
- Add few-shot examples to prompts
- Use SFT baseline before RAFT
Training Loss Increasing
Symptoms: Loss goes up after cycle 4-5
Solutions:
- Stop training - you’ve peaked
- Lower learning rate
- Increase
reward_thresholdto filter stricter
SSH Connection Failures
Symptoms: Intermittent verification failures
Solutions:
- Use SSH key auth (not password)
- Increase
timeout.connection - Check Windows firewall
- Test:
ssh -v user@host
Out of Memory
Symptoms: CUDA/ROCm OOM errors
Solutions:
- Reduce
batch_size - Enable
gradient_checkpointing - Use smaller model
- For Strix Halo: check unified memory limits
See Troubleshooting for more solutions.
Next Steps
- Research Findings - See training results and analysis
- Techniques Reference - Understand the techniques
- Distillation Pipeline - Generate data with external LLMs
- Verifier Reference - Deep dive into verification