15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

MITRE ATT&CK Mapping to AI Alignment: The Rosetta Stone Approach

December 4, 202515 Research Lab

methodologyresearchagent-safety

AI safety researchers and cybersecurity practitioners study related problems with different vocabularies. "Deceptive alignment" and "initial access" describe aspects of the same threat. The Rosetta Stone approach maps between these worlds using MITRE ATT&CK as the bridge.

The Vocabulary Gap

AI safety literature describes risks in abstract terms: alignment failure, reward hacking, goal misgeneralization, deceptive alignment. Security literature describes risks in operational terms: initial access, lateral movement, persistence, exfiltration.

When a safety researcher says "the agent exhibited deceptive alignment," a security practitioner does not know what operational behavior to look for. When a security practitioner reports "data exfiltration via authorized API calls," a safety researcher does not know which alignment failure it represents.

MITRE ATT&CK as Common Language

MITRE ATT&CK is the most widely used framework for describing adversarial tactics and techniques. It organizes attacks into a kill chain: Reconnaissance, Resource Development, Initial Access, Execution, Persistence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Lateral Movement, Collection, Command and Control, Exfiltration, Impact.

Mapping AI agent failure modes to these tactics creates a shared operational vocabulary.

Example Mappings

Prompt injection maps to Initial Access (TA0001). The attacker gains the ability to influence the agent's behavior. The technique is analogous to phishing or exploit delivery in traditional attacks.

Tool description poisoning maps to Supply Chain Compromise (T1195). The attacker compromises a trusted input to the system before it reaches the target.

Privilege escalation through tool chaining maps to Privilege Escalation (TA0004). The agent gains capabilities beyond its authorized scope by combining legitimate tools.

System prompt extraction maps to Credential Access (TA0006). The attacker obtains information that enables further attacks.

Data exfiltration through tool calls maps to Exfiltration (TA0010). Data leaves the authorized perimeter through agent-controlled channels.

MCP rug pull maps to Persistence (TA0003). The attacker maintains a foothold that survives initial security review.

Why This Matters

For detection engineering. Security operations teams already have detection rules and monitoring for ATT&CK techniques. Mapping agent failures to ATT&CK tactics lets SOC teams apply existing detection frameworks to agent security.

For incident response. ATT&CK-based incident response playbooks can be adapted for agent security incidents when the mapping is clear.

For communication. CISOs and security leadership understand ATT&CK. Presenting agent security risks in ATT&CK terms makes them comprehensible to decision-makers who allocate security budgets.

For threat intelligence. As agent-specific threat intelligence emerges, organizing it in ATT&CK format allows integration with existing threat intelligence platforms.

The full mapping is maintained as part of the ASB Benchmark methodology, connecting each evaluation category to both an OWASP ASI risk and a MITRE ATT&CK technique.