Quick Start

Get malagent running in 30 minutes

Prerequisites

Before starting, ensure you have:

  • Training Host: Linux with AMD GPU (Strix Halo recommended)
  • Windows DEVBOX: Windows 10/11 with MSVC Build Tools
  • Elastic Security: For detection-based rewards (optional for MVR mode)
  • Proxmox Server: For VM pool orchestration (optional)

Installation

1. Clone the Repository

git clone https://github.com/professor-moody/malagent.git
cd malagent

2. Build the Toolbox

The custom ROCm toolbox provides the ML environment:

cd toolbox
./build.sh

3. Create and Enter the Toolbox

# Remove old toolbox if exists
toolbox rm -f malagent || true

# Create the toolbox
toolbox create malagent --image localhost/malagent:latest

# Enter the toolbox
toolbox enter malagent

4. Install malagent

pip install -e .

Configuration

The easiest way to configure malagent is with the interactive setup wizard:

malagent setup

This guides you through:

  1. Mode Selection: Minimal (MVR), Standard (Elastic), or Full (Proxmox)
  2. Windows Build Server: SSH connection for MSVC compilation
  3. Elastic Security: Kibana API and detection rules (if selected)
  4. Proxmox: VM orchestration and template discovery (if selected)

Quick Setup Options

# Minimal setup (Windows build server only)
malagent setup --minimal

# Full setup (all infrastructure)
malagent setup --full

Manual Configuration

If you prefer manual configuration:

cp configs/raft_config.yaml.example configs/raft_config.yaml

Edit with your connection details:

# Windows build server
windows:
  host: "10.0.0.152"
  username: "keys"
  key_file: "~/.ssh/win"

# Verifier mode
verifier:
  mode: mvr  # or "elastic" for full detection

Verify Connection

malagent test --level smoke

First Training Run

MVR Mode (Compilation Only)

Start with MVR mode for faster iteration:

malagent raft train --mode mvr --prompts data/prompts/mvr_prompt_v2.jsonl

This will:

  1. Load prompts from the dataset
  2. Generate code samples
  3. Compile each sample via SSH
  4. Filter by compilation success
  5. Train on successful samples

Monitor Progress

Watch the training output:

RAFT CYCLE 1/3
==============
Generating samples... 569 prompts × 8 samples
Verifying 4552 samples...
  Compiled: 1823 (40.1%)
  Failed: 2729
Filtering samples...
  Kept: 912 samples (reward >= 0.5)
Training on filtered samples...
  Loss: 0.856 → 0.342

Elastic Mode (Full Detection)

Once you have Elastic Security configured:

malagent raft train --mode elastic --config configs/elastic_verifier.yaml

CLI Commands Reference

malagent setup              # Interactive setup wizard
malagent info               # Show environment info
malagent test --level full  # Run full validation tests

malagent raft train         # Run RAFT training
malagent sft train          # Run SFT training
malagent verify --code x.cpp  # Verify single file

malagent proxmox status     # Show VM status
malagent elastic rules      # Check detection rules

Next Steps

  1. Set up Elastic Controller — Enable detection-based rewards
  2. Configure VM Pool — For sample execution
  3. Understand the Pipeline — Full training workflow