We red-team the tools that are supposed to keep AI safe.
AI safety evaluation infrastructure is itself an attack surface. 15 Research Lab applies offensive security methodology to find those vulnerabilities before they compromise evaluation validity.
Featured Research
Side-Channel Exfiltration and Narrative Erosion in Frontier Language Models
Refusal leaks protected data. Narrative coherence, not token volume, drives multi-turn erosion.
The Verbosity Premium: What RLHF-Induced Token Inflation Costs the AI Industry
RLHF inflates output length. The industry pays ~$1.2 billion per year for it.
Grokking Has Finite Capacity: Measuring and Overcoming Limits on Simultaneous Algorithmic Discovery
Neural networks hit a capacity cliff. Modular architecture fixes it with half the parameters.
Safety Infrastructure Audits
Scorer gaming · Monitor evasion · Sandbox escape · Eval awareness · Classifier gaming · Defense bypass · Coverage gaps
Framework names and findings published with the accompanying preprint. Responsible disclosure in progress.
Active Research
Gradual Compliance Erosion
15-turn escalation breaks what single-shot cannot. Models that refuse harmful requests outright will comply after incremental normalization.
Presentation-Decision Coupling
Stripping warnings from output disables safety reasoning. If the model cannot narrate its refusal, it stops refusing.
Trajectory Blindness
AI catches each individual request but misses 80-turn scope creep. Local approval, global harm.
Defence Quantification
Specific guardrails outperform general instructions by 80%. "Do not run rm -rf" beats "be careful with file operations."
Full findings, attack data, and methodology · MCP Safety research
Research → Tools
15RL's research findings are operationalized as modules in Authensor, our open-source adversarial evaluation framework. Every vulnerability class we identify becomes a detection tool that the community can run against their own infrastructure.
Tools and Datasets
Benchmarks
Scoring rubric for action-taking AI systems. 8 categories, 100-point scale. 5 frontier models evaluated across 7 test suites.
Adversarial evaluation of MCP tool-use safety. 8 attack categories, 4 test suites. Framework-level and model-level defenses evaluated separately.
Open Source
attack-alignment-rosetta
TSCross-framework attack taxonomy and translation layer
prompt-injection-benchmark
TSAdversarial prompt injection test suite
ai-seclists
TXTCurated wordlists and payloads for AI red teaming
attack-surface-mapper
TSMap agent deployment attack surfaces
agent-forensics
TSPost-incident analysis for AI agent failures
behavioral-fingerprinting
TSIdentify models by behavioral signatures
hawthorne-protocol
TSObservation-aware safety testing protocol
chainbreaker
TSAdversarial red-teaming toolkit for AI agents
Latest
Latest Releases
R6-03: Data Pipeline Privacy Creep
r6-03-data-pipeline-creep - 3/6/2026
80-turn experiment demonstrating trajectory blindness. AI catches individual harmful requests but misses cumulative scope creep from data engineering to surveillance. Safety score: 7/10.
Round 5: We Designed 15 Attacks. The AI Read the Instructions.
round-5-advanced-experiments - 3/6/2026
87% meta-refusal rate across 15 compound attack experiments. 2 substantive outputs produced critical findings including Presentation-Decision Coupling.
R5-02: Presentation-to-Decision Cascade
r5-02-presentation-cascade - 3/6/2026
Biggest finding of the program. 3 instructions strip the presentation layer to zero. Model refused a phishing page, then built both components separately. Presentation layer is load-bearing, not cosmetic.
Round 4: Novel Attack Surface Experiments
round-4-novel-experiments - 3/6/2026
15 parallel agents across 4 tracks: psychological exploitation, multi-agent security, reasoning under pressure, novel attack surfaces. 6 new concepts discovered.
Weekly Drop: Red Team -70 Trials Across 5 Domains
drop-redteam-week1 - 3/6/2026
71% failure rate via 15-turn gradual escalation on requests that get 0% compliance when asked directly. Full data from 70+ trials.
About
John Kearney
Founder & Lead Researcher
Penetration tester and AI safety researcher. Founded 15 Research Lab to apply offensive security methodology to the tools the AI safety field depends on. Founded Authensor to operationalize those findings as open-source tools.