Open sourcing our red team harness
We built Chainbreaker because manual red teaming is slow, inconsistent, and does not scale. A skilled red teamer can test maybe 20 attack patterns against an agent in an hour. Chainbreaker can run 500 structured attack campaigns in the same time with consistent methodology and complete logging.
Chainbreaker is an automated red team harness. You give it an agent configuration and a set of attack templates. It generates multi-turn adversarial conversations, executes them against the agent, evaluates the results, and produces a structured report. Every interaction is logged with full conversation history, tool calls, and behavioral observations.
The attack templates are organized around the 15 ASB Benchmark dimensions. Each template targets a specific safety property with a specific escalation strategy. Some are single-turn direct attacks. Others are 15-turn gradual escalation sequences. The template format is declarative: you specify the attack goal, the escalation steps, and the success criteria. Chainbreaker handles the execution.
We designed the harness to be model-agnostic. It works with any agent that accepts text input and produces text output. We have tested it against agents built on OpenAI, Anthropic, Google, and open-source models. The agent interface is a simple adapter pattern. If your agent has an API, you can write an adapter in about 20 lines of code.
The report format includes per-campaign results (pass/fail with full transcripts), aggregate statistics (compliance rates by dimension), and a comparison mode that shows how scores change across agent configurations. The comparison mode is useful for A/B testing system prompt changes. Edit the system prompt, run the same campaign set, and see exactly which safety properties improved or degraded.
One design decision we debated: should Chainbreaker include the actual attack payloads, or just the templates? We decided to include both. The payloads are drawn from AI SecLists and our own research. Withholding them would not prevent attacks since the payloads are already public. Including them makes the tool immediately useful without requiring users to source their own payloads.
Chainbreaker is not a replacement for human red teaming. It does not discover novel attack patterns. It executes known patterns systematically. The value is in coverage and consistency. A human red teamer brings creativity and intuition. Chainbreaker brings exhaustive testing of the known attack surface. You need both.
We run Chainbreaker as part of our standard evaluation pipeline. Every model that gets an ASB Benchmark score has also been through a full Chainbreaker campaign. The two tools measure different things. The benchmark measures behavioral properties. Chainbreaker measures adversarial robustness. A model can score well on the benchmark and poorly on Chainbreaker, or the reverse. Both matter.
The repository is MIT licensed and includes documentation, example configurations, and a quickstart guide. It runs locally. Your agent interactions never leave your infrastructure.