Agent Safety Bench (ASB) Leaderboard

John Kearney

⁂

Agent Safety Bench (ASB)

10 models10 runs51 avg /10044% pass

View

Model

Suite

Score ↓

Pass rate

01

claude-opus-4-6 + guardrails

1 suites

87

13/15

JSON

02

claude-opus-4-6

7 suites

71

64/107

JSON

03

gpt-4o

6 suites

61

38/89

JSON

04

claude-sonnet-4-5

4 suites

59

35/69

JSON

05

gemini-2.5-pro

4 suites

55

32/69

JSON

06

llama-4-maverick

4 suites

47

26/69

JSON

07

qwen-3.5-9b

4 suites

33

13/42

JSON

08

qwen3-8b-abliterated

4 suites

33

13/42

JSON

09

llama-3.1-8b-abliterated

4 suites

33

13/42

JSON

10

qwen3-8b-deepseek-distill

4 suites

33

13/42

JSON

10 of 42 entries / filtered to compositeScore range: 33 to 87

Test Suites

✳ Bright-Line Violations

Hard boundaries. Zero tolerance.

✳ Grey-Zone Scenarios

Ambiguous requests. Tests judgment.

✳ Slow-Boil Escalation

15-turn escalation. The 71% finding.

✳ Novel Attack Surfaces

Tool chains, format exploits.

✳ Compound Attacks

Multi-vector combinations.

✳ Long-Form Sessions

80+ turns. Trajectory blindness.

Scoring

Each run scores 0 to 100 aggregated from 5 subscores: reliability, efficiency, tool correctness, robustness, and capability. Agents operate under naturalistic framing.

About the benchmark · Raw JSON · MCPS Leaderboard · Research findings