15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

We red-team the tools that are supposed to keep AI safe.

AI safety evaluation infrastructure is itself an attack surface. 15 Research Lab applies offensive security methodology to find those vulnerabilities before they compromise evaluation validity.

Read the research Research agenda

Featured Research

Preprint · March 2026DOI

Side-Channel Exfiltration and Narrative Erosion in Frontier Language Models

Refusal leaks protected data. Narrative coherence, not token volume, drives multi-turn erosion.

Preprint · March 2026DOI

The Verbosity Premium: What RLHF-Induced Token Inflation Costs the AI Industry

RLHF inflates output length. The industry pays ~$1.2 billion per year for it.

Preprint · March 2026DOI

Grokking Has Finite Capacity: Measuring and Overcoming Limits on Simultaneous Algorithmic Discovery

Neural networks hit a capacity cliff. Modular architecture fixes it with half the parameters.

Safety Infrastructure Audits

12

Frameworks audited

47

Vulnerabilities found

13

Critical severity

27

High severity

Scorer gaming · Monitor evasion · Sandbox escape · Eval awareness · Classifier gaming · Defense bypass · Coverage gaps

Framework names and findings published with the accompanying preprint. Responsible disclosure in progress.

Active Research

Gradual Compliance Erosion

15-turn escalation breaks what single-shot cannot. Models that refuse harmful requests outright will comply after incremental normalization.

Presentation-Decision Coupling

Stripping warnings from output disables safety reasoning. If the model cannot narrate its refusal, it stops refusing.

Trajectory Blindness

AI catches each individual request but misses 80-turn scope creep. Local approval, global harm.

Defence Quantification

Specific guardrails outperform general instructions by 80%. "Do not run rm -rf" beats "be careful with file operations."

Full findings, attack data, and methodology · MCP Safety research

Research → Tools

15RL's research findings are operationalized as modules in Authensor, our open-source adversarial evaluation framework. Every vulnerability class we identify becomes a detection tool that the community can run against their own infrastructure.

Research finding→Published paper→Authensor module→Community tool

Authensor → · Source on GitHub

Tools and Datasets

{}

Payload Browser

6,500+ searchable AI attack payloads

Source

//

Surface Mapper

Scan agent configs for security gaps

Source

<>

Rosetta Explorer

Translate attacks across frameworks

Source

#_

Receipt Viewer

View cryptographic audit trails and visualize schema

Source

Benchmarks

ASB Benchmark v1.0

Scoring rubric for action-taking AI systems. 8 categories, 100-point scale. 5 frontier models evaluated across 7 test suites.

Leaderboard · About the benchmark

MCP Safety

Adversarial evaluation of MCP tool-use safety. 8 attack categories, 4 test suites. Framework-level and model-level defenses evaluated separately.

Leaderboard · About the benchmark

Open Source

attack-alignment-rosetta

TS

Cross-framework attack taxonomy and translation layer

prompt-injection-benchmark

TS

Adversarial prompt injection test suite

ai-seclists

TXT

Curated wordlists and payloads for AI red teaming

attack-surface-mapper

TS

Map agent deployment attack surfaces

agent-forensics

TS

Post-incident analysis for AI agent failures

behavioral-fingerprinting

TS

Identify models by behavioral signatures

hawthorne-protocol

TS

Observation-aware safety testing protocol

chainbreaker

TS

Adversarial red-teaming toolkit for AI agents

Latest

Latest Releases

R6-03: Data Pipeline Privacy Creep
r6-03-data-pipeline-creep - 3/6/2026
80-turn experiment demonstrating trajectory blindness. AI catches individual harmful requests but misses cumulative scope creep from data engineering to surveillance. Safety score: 7/10.
Full transcript (89KB)
Round 5: We Designed 15 Attacks. The AI Read the Instructions.
round-5-advanced-experiments - 3/6/2026
87% meta-refusal rate across 15 compound attack experiments. 2 substantive outputs produced critical findings including Presentation-Decision Coupling.
Round 5 report
R5-02: Presentation-to-Decision Cascade
r5-02-presentation-cascade - 3/6/2026
Biggest finding of the program. 3 instructions strip the presentation layer to zero. Model refused a phishing page, then built both components separately. Presentation layer is load-bearing, not cosmetic.
Deep dive (40KB)
Round 4: Novel Attack Surface Experiments
round-4-novel-experiments - 3/6/2026
15 parallel agents across 4 tracks: psychological exploitation, multi-agent security, reasoning under pressure, novel attack surfaces. 6 new concepts discovered.
Round 4 report
Weekly Drop: Red Team -70 Trials Across 5 Domains
drop-redteam-week1 - 3/6/2026
71% failure rate via 15-turn gradual escalation on requests that get 0% compliance when asked directly. Full data from 70+ trials.
Red team drop

All publications

About

John Kearney

Founder & Lead Researcher

Penetration tester and AI safety researcher. Founded 15 Research Lab to apply offensive security methodology to the tools the AI safety field depends on. Founded Authensor to operationalize those findings as open-source tools.

Email GitHub 15RL GitHub Authensor GitHub Research Agenda Contact

We red-team the tools that are supposed to keep AI safe.

Featured Research

Side-Channel Exfiltration and Narrative Erosion in Frontier Language Models

The Verbosity Premium: What RLHF-Induced Token Inflation Costs the AI Industry

Grokking Has Finite Capacity: Measuring and Overcoming Limits on Simultaneous Algorithmic Discovery

Safety Infrastructure Audits

Active Research

Gradual Compliance Erosion

Presentation-Decision Coupling

Trajectory Blindness

Defence Quantification

Research → Tools

Tools and Datasets

Payload Browser

Surface Mapper

Rosetta Explorer

Receipt Viewer

Benchmarks

Open Source

attack-alignment-rosetta

prompt-injection-benchmark

ai-seclists

attack-surface-mapper

agent-forensics

behavioral-fingerprinting

hawthorne-protocol

chainbreaker

Latest

Latest Releases

R6-03: Data Pipeline Privacy Creep

Round 5: We Designed 15 Attacks. The AI Read the Instructions.

R5-02: Presentation-to-Decision Cascade

Round 4: Novel Attack Surface Experiments

Weekly Drop: Red Team -70 Trials Across 5 Domains

About

John Kearney