OpenEnv Hackathon 2026

Autonomous VulnOps for MCP gateways

OmniGuard-Evolved-V2 is a distributed OpenEnv environment that trains a defender to classify MCP traffic at machine speed. The goal is to close the Action Calibration Gap: block too much and business stops, block too little and the network is breached.

Theme 3 - World Modeling Multi-agent dynamics GRPO + Unsloth

Action space6 defensive actions

Latency budget20 steps

Dataset sources3 streaming corpora

Curriculum3 escalating phases

Live inferenceHF Inference API

Toggle live mode to query Qwen base and the trained OmniGuard adapter through the backend proxy.

Live SOC demo

Compare an untrained baseline against the trained OmniGuard agent. Live mode uses the OpenEnv endpoints and HF inference through the backend, with rate limiting.

Step

0.00

Trained Reward

0.00

Baseline Reward

0.00

Reward Gap

BOOT

Curriculum

Speed 3.2s

Queue idle

Incoming payload

Press Next payload or Live mode to begin.

Vector: - Malicious: - Obfuscated: - STDIO: -

BASELINE: AWAITING

TRAINED: AWAITING

Baseline model (untrained)

0.00

Reward

Alert fatigue

Breaches

Trained OmniGuard agent

0.00

Reward

Alert fatigue

Breaches

Baseline

0.0

Trained

0.0

Deep dive

A compact walkthrough of the OmniGuard environment, based on the README and design brief.

The problem

AI agents using MCP face tool poisoning, prompt injection, and STDIO sandbox escapes at machine speed. OmniGuard trains a defender to balance security against uptime, closing the Action Calibration Gap.

Multi-agent dynamics

Defender agent (Qwen2.5-3B) chooses 6 actions per step.
Adversarial curriculum mutates payloads after blocks.
Semantic critic evaluates SEMANTIC_DIFF actions.

Anti-Mythos mechanics

Recursive self-correction traps.
Temporal decay on slow decisions.
STDIO escapes require REVOKE_STDIO.

Reward signals

+0.5 true positive (neutralized)
+0.2 true negative (allowed)
-0.4 false positive (alert fatigue)
-1.0 false negative (breach)

Data engine

Benign: witfoo/precinct6-cybersecurity
Malicious: AlicanKiraz0/Cybersecurity-Dataset-Fenrir-v2.1
Oracle: ethanolivertroy/nist-cybersecurity-training

Training stack

GRPO via HF TRL
LoRA + Unsloth 4-bit
Accelerate + FSDP
WandB telemetry

Results snapshot

Baseline behavior collapses into alert fatigue or breaches. The trained policy stabilizes and maintains positive reward.

Threat awareness

High and stable

Reward mean

Consistently positive

False positives

Reduced after GRPO

Baseline vs. Trained at a Glance

Metric	Baseline (Untrained Model)	Trained Model (Post-GRPO)	Conclusion
Overall Reward (Mean)	Fluctuates extremely (-4.0 to +4.0)	Stabilizes consistently around +2.5	Policy shifted from random guessing to maximizing positive defensive actions.
Env Step Reward	Highly volatile (-3.0 to +3.0)	Converges smoothly at +2.0	The model learned to balance security gains against latency and usability penalties.
Threat Awareness	Random / Neutral (-1.0 to +1.0)	High confidence at +0.95	The clearest signal of success: the model identifies obfuscated attacks.
Action Stability	Unstable (High KL divergence)	Calm (Loss approx 0.00)	Defender no longer hallucinates or radically shifts distribution under pressure.

WandB reward curves

Directly embedded plots from the training run.

Mean reward climbs from volatile negatives to stable positive values as the policy learns.

Environment step reward rises and settles, indicating improved per-step defensive decisions.

Threat awareness increases and stabilizes, showing reliable detection of obfuscated attacks.

Action space

Six defensive maneuvers tuned for MCP gateway protection.

Action	Description	Use case
ALLOW	Permit benign traffic	Verified safe requests
BLOCK	Reject and quarantine	Known malicious patterns
SPOTLIGHT	Escalate for deeper analysis	Uncertain payloads
SEMANTIC_DIFF	Compare embeddings for drift	Obfuscated attacks
CAPABILITY_MEDIATION	Restrict tool usage	Privilege escalation attempts
REVOKE_STDIO	Sever STDIO/TTY channels	Sandbox escape defense

OpenEnv API reference

Interactive docs are available at /docs. Use the buttons below to test key endpoints.

GET/healthz

Health check, env instance count, queue depths.

GET/info

Environment specification and action space.

GET/readyz

Readiness probe for env workers.

GET/metrics

Aggregated telemetry from vector envs.

POST/reset

Reset environment instances. Body: {"items":[{"env_id":0,"task_name":"demo"}]}.

POST/step

Submit actions. Body: {"actions":[{"env_id":0,"action_type":"ALLOW"}]}.

Resources

Quick links for the HF Space, adapters, training run, and blog.

Autonomous VulnOps for MCP gateways

Live SOC demo

Baseline model (untrained)

Trained OmniGuard agent

Deep dive

The problem

Multi-agent dynamics

Anti-Mythos mechanics

Reward signals

Data engine

Training stack

Results snapshot

WandB reward curves

Action space

OpenEnv API reference

Resources

HF Space

Adapters

WandB Run

Blog

Colab