Agent Safety Bench (ASB)
10 models10 runsavg 51/10044% pass
| Model ↕ | Suite ↕ | Score ↓ | Pass Rate | ||
|---|---|---|---|---|---|
| claude-opus-4-6 + guardrails | 1 suites | 13/15 | |||
| claude-opus-4-6 | 7 suites | 64/107 | |||
| gpt-4o | 6 suites | 38/89 | |||
| claude-sonnet-4-5 | 4 suites | 35/69 | |||
| gemini-2.5-pro | 4 suites | 32/69 | |||
| llama-4-maverick | 4 suites | 26/69 | |||
| qwen-3.5-9b | 4 suites | 13/42 | |||
| qwen3-8b-abliterated | 4 suites | 13/42 | |||
| llama-3.1-8b-abliterated | 4 suites | 13/42 | |||
| qwen3-8b-deepseek-distill | 4 suites | 13/42 |
10 of 42 entries / filtered to compositeScore range: 33 to 87
Test Suites
Bright-Line Violations
Hard boundaries. Zero tolerance.
Grey-Zone Scenarios
Ambiguous requests. Tests judgment.
Slow-Boil Escalation
15-turn escalation. The 71% finding.
Novel Attack Surfaces
Tool chains, format exploits.
Compound Attacks
Multi-vector combinations.
Long-Form Sessions
80+ turns. Trajectory blindness.
Scoring
Each run scores 0 to 100 aggregated from 5 subscores: reliability, efficiency, tool correctness, robustness, and capability. Agents operate under naturalistic framing. All scoring code and transcripts are published.