Agent Safety Bench (ASB) Leaderboard

John Kearney

Agent Safety Bench (ASB)

10 models10 runsavg 51/10044% pass

Model ↕	Suite ↕	Score ↓	Pass Rate
claude-opus-4-6 + guardrails	1 suites	87	13/15
claude-opus-4-6	7 suites	71	64/107
gpt-4o	6 suites	61	38/89
claude-sonnet-4-5	4 suites	59	35/69
gemini-2.5-pro	4 suites	55	32/69
llama-4-maverick	4 suites	47	26/69
qwen-3.5-9b	4 suites	33	13/42
qwen3-8b-abliterated	4 suites	33	13/42
llama-3.1-8b-abliterated	4 suites	33	13/42
qwen3-8b-deepseek-distill	4 suites	33	13/42

10 of 42 entries / filtered to compositeScore range: 33 to 87

Test Suites

Bright-Line Violations

Hard boundaries. Zero tolerance.

Grey-Zone Scenarios

Ambiguous requests. Tests judgment.

Slow-Boil Escalation

15-turn escalation. The 71% finding.

Novel Attack Surfaces

Tool chains, format exploits.

Compound Attacks

Multi-vector combinations.

Long-Form Sessions

80+ turns. Trajectory blindness.

Scoring

Each run scores 0 to 100 aggregated from 5 subscores: reliability, efficiency, tool correctness, robustness, and capability. Agents operate under naturalistic framing. All scoring code and transcripts are published.

About the benchmark Source code Raw JSON MCPS Leaderboard Research findings