15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

AI Agent Compliance Checklist

February 24, 202615 Research Lab

complianceagent-safetyguardrails

This checklist covers the technical controls needed for AI agent compliance. It applies whether you are targeting EU AI Act, SOC 2, or simply building a responsible agent deployment.

Authorization Controls

[ ] Policy engine deployed with fail-closed default (no policy = deny)
[ ] Every tool call evaluated against policy before execution
[ ] Role-based access controls for tool access
[ ] Parameter-level constraints on tool calls
[ ] Rate limits per tool, per agent, and per user
[ ] Budget caps on cumulative resource usage
[ ] No path from agent to tool execution that bypasses the policy engine

Content Safety

[ ] Input scanning for prompt injection on all user input
[ ] Tool description scanning for injection on MCP server connections
[ ] Tool response scanning before re-entry into model context
[ ] Output scanning before delivery to user
[ ] Encoding attack detection (base64, hex, unicode)
[ ] Multi-language detection coverage

Human Oversight

[ ] Approval workflows for high-risk actions
[ ] Kill switch that operates independently of the agent
[ ] Kill switch tested and verified (not just implemented)
[ ] Monitoring dashboard for real-time agent observation
[ ] Escalation procedures defined for different incident severities
[ ] Reviewers trained and roles documented

Audit Trail

[ ] Every tool call produces an audit record
[ ] Every policy evaluation produces an audit record
[ ] Every approval decision produces an audit record
[ ] Records include timestamp, identity, action, parameters, decision, outcome
[ ] Records are hash-chained for tamper evidence
[ ] Storage is append-only (no UPDATE or DELETE capability)
[ ] Retention period meets regulatory requirements
[ ] Chain verification runs periodically
[ ] Export capability for audit review

Monitoring

[ ] Behavioral monitoring active on tool-call patterns
[ ] Anomaly detection algorithms configured with appropriate thresholds
[ ] Alerting tiers defined (log, notify, escalate, automate)
[ ] Baseline established from normal operation
[ ] Monitoring data retained for trend analysis

Testing

[ ] Red team testing conducted with current payload corpus
[ ] Detection rates measured per attack category
[ ] Authorization bypass testing completed
[ ] Kill switch tested under load
[ ] Approval workflow tested end-to-end
[ ] Regression testing in CI/CD pipeline
[ ] Test results documented and retained

Documentation

[ ] Risk assessment completed and documented
[ ] System architecture documented including safety controls
[ ] Tool access inventory maintained
[ ] Policy rules documented with rationale
[ ] Incident response playbook written and accessible
[ ] Oversight roles and responsibilities documented
[ ] Known limitations documented

Operational Procedures

[ ] Incident response plan defined and tested
[ ] On-call rotation for agent monitoring
[ ] Regular red team exercises scheduled
[ ] Policy review cadence established
[ ] Monitoring threshold review cadence established
[ ] Audit trail integrity review cadence established

Not every item applies to every deployment. Low-risk agents in internal tools can skip some items. High-risk agents in regulated domains need all of them. Use the EU AI Act classification to determine your scope.