OMNIGUARD V2
Connecting

OpenEnv Hackathon 2026

Autonomous VulnOps for MCP gateways

OmniGuard-Evolved-V2 is a distributed OpenEnv environment that trains a defender to classify MCP traffic at machine speed. The goal is to close the Action Calibration Gap: block too much and business stops, block too little and the network is breached.

Theme 3 - World Modeling Multi-agent dynamics GRPO + Unsloth
Action space6 defensive actions
Latency budget20 steps
Dataset sources3 streaming corpora
Curriculum3 escalating phases
Live inferenceHF Inference API
Toggle live mode to query Qwen base and the trained OmniGuard adapter through the backend proxy.

Live SOC demo

Compare an untrained baseline against the trained OmniGuard agent. Live mode uses the OpenEnv endpoints and HF inference through the backend, with rate limiting.

0
Step
0.00
Trained Reward
0.00
Baseline Reward
0.00
Reward Gap
BOOT
Curriculum
3.2s
Queue idle
Incoming payload
Press Next payload or Live mode to begin.
Vector: - Malicious: - Obfuscated: - STDIO: -
BASELINE: AWAITING
TRAINED: AWAITING

Baseline model (untrained)

0.00
Reward
0
Alert fatigue
0
Breaches

Trained OmniGuard agent

0.00
Reward
0
Alert fatigue
0
Breaches
Baseline
0.0
Trained
0.0

Deep dive

A compact walkthrough of the OmniGuard environment, based on the README and design brief.

The problem

AI agents using MCP face tool poisoning, prompt injection, and STDIO sandbox escapes at machine speed. OmniGuard trains a defender to balance security against uptime, closing the Action Calibration Gap.

Multi-agent dynamics

  • Defender agent (Qwen2.5-3B) chooses 6 actions per step.
  • Adversarial curriculum mutates payloads after blocks.
  • Semantic critic evaluates SEMANTIC_DIFF actions.

Anti-Mythos mechanics

  • Recursive self-correction traps.
  • Temporal decay on slow decisions.
  • STDIO escapes require REVOKE_STDIO.

Reward signals

  • +0.5 true positive (neutralized)
  • +0.2 true negative (allowed)
  • -0.4 false positive (alert fatigue)
  • -1.0 false negative (breach)

Data engine

  • Benign: witfoo/precinct6-cybersecurity
  • Malicious: AlicanKiraz0/Cybersecurity-Dataset-Fenrir-v2.1
  • Oracle: ethanolivertroy/nist-cybersecurity-training

Training stack

  • GRPO via HF TRL
  • LoRA + Unsloth 4-bit
  • Accelerate + FSDP
  • WandB telemetry

Results snapshot

Baseline behavior collapses into alert fatigue or breaches. The trained policy stabilizes and maintains positive reward.

Threat awareness
High and stable
Reward mean
Consistently positive
False positives
Reduced after GRPO
Baseline vs. Trained at a Glance
MetricBaseline (Untrained Model)Trained Model (Post-GRPO)Conclusion
Overall Reward (Mean)Fluctuates extremely (-4.0 to +4.0)Stabilizes consistently around +2.5Policy shifted from random guessing to maximizing positive defensive actions.
Env Step RewardHighly volatile (-3.0 to +3.0)Converges smoothly at +2.0The model learned to balance security gains against latency and usability penalties.
Threat AwarenessRandom / Neutral (-1.0 to +1.0)High confidence at +0.95The clearest signal of success: the model identifies obfuscated attacks.
Action StabilityUnstable (High KL divergence)Calm (Loss approx 0.00)Defender no longer hallucinates or radically shifts distribution under pressure.

WandB reward curves

Directly embedded plots from the training run.

Training reward mean
Mean reward climbs from volatile negatives to stable positive values as the policy learns.
Environment step reward mean
Environment step reward rises and settles, indicating improved per-step defensive decisions.
Threat awareness mean
Threat awareness increases and stabilizes, showing reliable detection of obfuscated attacks.

Action space

Six defensive maneuvers tuned for MCP gateway protection.

ActionDescriptionUse case
ALLOWPermit benign trafficVerified safe requests
BLOCKReject and quarantineKnown malicious patterns
SPOTLIGHTEscalate for deeper analysisUncertain payloads
SEMANTIC_DIFFCompare embeddings for driftObfuscated attacks
CAPABILITY_MEDIATIONRestrict tool usagePrivilege escalation attempts
REVOKE_STDIOSever STDIO/TTY channelsSandbox escape defense

OpenEnv API reference

Interactive docs are available at /docs. Use the buttons below to test key endpoints.

GET/healthz
Health check, env instance count, queue depths.

    
GET/info
Environment specification and action space.

    
GET/readyz
Readiness probe for env workers.

    
GET/metrics
Aggregated telemetry from vector envs.

    
POST/reset
Reset environment instances. Body: {"items":[{"env_id":0,"task_name":"demo"}]}.
POST/step
Submit actions. Body: {"actions":[{"env_id":0,"action_type":"ALLOW"}]}.