15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

EU AI Act Article 9: Risk Management for AI Agents

February 20, 202615 Research Lab

eu-ai-actcomplianceagent-safetymethodology

Article 9 of the EU AI Act mandates a risk management system for high-risk AI. It is not a one-time assessment. It is a continuous process throughout the system's lifecycle. Here is what it requires and how to implement it for AI agents.

What Article 9 Requires

The risk management system must:

Identify and analyze known and foreseeable risks associated with the AI system
Estimate and evaluate risks that may emerge during use, including misuse
Adopt risk management measures including design choices and technical safeguards
Ensure residual risk is acceptable after mitigation
Test the system to ensure risk management measures are effective

This is iterative. As the system operates and new risks emerge, the process cycles back to identification and analysis.

Risk Identification for AI Agents

For agent systems, the foreseeable risks include:

Prompt injection leading to unauthorized actions. Likelihood: high. Impact: depends on tool access. Mitigation: input scanning, policy enforcement, output validation.

Tool misuse where legitimate tools are used for unintended purposes. Likelihood: medium. Impact: high for write-capable tools. Mitigation: parameter validation, least privilege, behavioral monitoring.

Data exfiltration through tool calls or response channels. Likelihood: medium. Impact: high for sensitive data. Mitigation: egress controls, URL allowlisting, output scanning.

Privilege escalation through tool chaining or context manipulation. Likelihood: medium. Impact: high. Mitigation: transitive authorization, context verification.

Autonomous drift where unsupervised agents deviate from intended behavior. Likelihood: high for long-running agents. Impact: varies. Mitigation: behavioral monitoring, session limits, periodic review.

Mapping Mitigations to Technical Controls

Article 9 requires "appropriate risk management measures." These map to specific technical controls:

| Risk | Measure | Technical Control | |------|---------|------------------| | Prompt injection | Input validation | Aegis content scanner | | Tool misuse | Authorization | Policy engine with tool-level rules | | Data exfiltration | Egress control | URL allowlists, parameter scanning | | Privilege escalation | Access control | Role-based policies, fail-closed default | | Autonomous drift | Monitoring | Sentinel behavioral monitoring | | All risks | Auditability | Hash-chained receipt system |

Testing Requirements

Article 9(7) requires testing "against preliminary defined metrics and probabilistic thresholds." For agents, this means:

Red team testing against prompt injection payloads (using AI SecLists) with measured detection rates
Adversarial testing of authorization controls with measured bypass rates
Behavioral monitoring validation with measured detection latency
Approval workflow testing under various conditions

Document test results. Regulators will ask for evidence that your risk management measures work, not just that they exist.

Documentation

Maintain a risk register that tracks each identified risk, its assessment (likelihood, impact), the mitigations in place, residual risk, and testing results. Update it as the system changes and as new risks are identified.

This is not bureaucratic overhead if done correctly. The risk register drives engineering decisions about what controls to build and test. It should be a living document that the engineering team uses, not a compliance artifact that sits in a drawer.