Reward Configuration
Configurable reward signals for training
Reward Modes
malagent supports two reward modes, configurable via --reward-mode:
| Mode | Behavior |
|---|---|
binary | 0.0 for compile failure, 0.7 for any success |
graduated | 0.0-1.0 scale based on detection severity and other factors |
Verification Modes
| Mode | Verification | Typical Use |
|---|---|---|
mvr | Compilation only | Early training phases |
elastic | Full EDR detection | Evasion training |
Graduated Reward Factors
When using graduated mode with Elastic verification, reward starts at 1.0 and penalties are applied based on:
- Detection severity
- Number of rules triggered
- Behavioral vs signature detection
- MITRE technique difficulty
- Detection latency
- Rule source (ML, custom, prebuilt)
RAFT Threshold
Samples with reward >= threshold are kept for training. Default threshold: 0.5.