Reward Configuration

Configurable reward signals for training

Reward Modes

malagent supports two reward modes, configurable via --reward-mode:

ModeBehavior
binary0.0 for compile failure, 0.7 for any success
graduated0.0-1.0 scale based on detection severity and other factors

Verification Modes

ModeVerificationTypical Use
mvrCompilation onlyEarly training phases
elasticFull EDR detectionEvasion training

Graduated Reward Factors

When using graduated mode with Elastic verification, reward starts at 1.0 and penalties are applied based on:

  • Detection severity
  • Number of rules triggered
  • Behavioral vs signature detection
  • MITRE technique difficulty
  • Detection latency
  • Rule source (ML, custom, prebuilt)

RAFT Threshold

Samples with reward >= threshold are kept for training. Default threshold: 0.5.