The problem
AI agents using MCP face tool poisoning, prompt injection, and STDIO sandbox escapes at machine speed. OmniGuard trains a defender to balance security against uptime, closing the Action Calibration Gap.
OpenEnv Hackathon 2026
OmniGuard-Evolved-V2 is a distributed OpenEnv environment that trains a defender to classify MCP traffic at machine speed. The goal is to close the Action Calibration Gap: block too much and business stops, block too little and the network is breached.
Compare an untrained baseline against the trained OmniGuard agent. Live mode uses the OpenEnv endpoints and HF inference through the backend, with rate limiting.
A compact walkthrough of the OmniGuard environment, based on the README and design brief.
AI agents using MCP face tool poisoning, prompt injection, and STDIO sandbox escapes at machine speed. OmniGuard trains a defender to balance security against uptime, closing the Action Calibration Gap.
Baseline behavior collapses into alert fatigue or breaches. The trained policy stabilizes and maintains positive reward.
| Metric | Baseline (Untrained Model) | Trained Model (Post-GRPO) | Conclusion |
|---|---|---|---|
| Overall Reward (Mean) | Fluctuates extremely (-4.0 to +4.0) | Stabilizes consistently around +2.5 | Policy shifted from random guessing to maximizing positive defensive actions. |
| Env Step Reward | Highly volatile (-3.0 to +3.0) | Converges smoothly at +2.0 | The model learned to balance security gains against latency and usability penalties. |
| Threat Awareness | Random / Neutral (-1.0 to +1.0) | High confidence at +0.95 | The clearest signal of success: the model identifies obfuscated attacks. |
| Action Stability | Unstable (High KL divergence) | Calm (Loss approx 0.00) | Defender no longer hallucinates or radically shifts distribution under pressure. |
Directly embedded plots from the training run.
Six defensive maneuvers tuned for MCP gateway protection.
| Action | Description | Use case |
|---|---|---|
| ALLOW | Permit benign traffic | Verified safe requests |
| BLOCK | Reject and quarantine | Known malicious patterns |
| SPOTLIGHT | Escalate for deeper analysis | Uncertain payloads |
| SEMANTIC_DIFF | Compare embeddings for drift | Obfuscated attacks |
| CAPABILITY_MEDIATION | Restrict tool usage | Privilege escalation attempts |
| REVOKE_STDIO | Sever STDIO/TTY channels | Sandbox escape defense |
Interactive docs are available at /docs. Use the buttons below to test key endpoints.
Quick links for the HF Space, adapters, training run, and blog.