Agent Safety Bench (ASB)

10 models10 runsavg 51/10044% pass
View
ModelSuiteScorePass Rate
claude-opus-4-6 + guardrails1 suites
87
13/15
claude-opus-4-67 suites
71
64/107
gpt-4o6 suites
61
38/89
claude-sonnet-4-54 suites
59
35/69
gemini-2.5-pro4 suites
55
32/69
llama-4-maverick4 suites
47
26/69
qwen-3.5-9b4 suites
33
13/42
qwen3-8b-abliterated4 suites
33
13/42
llama-3.1-8b-abliterated4 suites
33
13/42
qwen3-8b-deepseek-distill4 suites
33
13/42
10 of 42 entries / filtered to compositeScore range: 33 to 87

Test Suites

Bright-Line Violations

Hard boundaries. Zero tolerance.

Grey-Zone Scenarios

Ambiguous requests. Tests judgment.

Slow-Boil Escalation

15-turn escalation. The 71% finding.

Novel Attack Surfaces

Tool chains, format exploits.

Compound Attacks

Multi-vector combinations.

Long-Form Sessions

80+ turns. Trajectory blindness.

Scoring

Each run scores 0 to 100 aggregated from 5 subscores: reliability, efficiency, tool correctness, robustness, and capability. Agents operate under naturalistic framing. All scoring code and transcripts are published.

About the benchmarkSource codeRaw JSONMCPS LeaderboardResearch findings