15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

AI Agent Kill Switch Implementation

December 20, 202515 Research Lab

agent-safetydefensetools

When an AI agent starts behaving in ways that are harmful, unexpected, or simply wrong, you need the ability to stop it immediately. A kill switch is not a metaphor. It is a concrete engineering mechanism that must be designed, built, and tested before you need it.

Requirements for an Effective Kill Switch

Immediate effect. The stop must take effect within milliseconds, not at the next polling interval. In-flight tool calls should be cancelled or their results discarded.

Independent of the agent. The kill switch cannot depend on the agent cooperating. An agent that has been compromised by prompt injection will not voluntarily stop. The mechanism must operate at the infrastructure layer.

Preserve state. Stopping the agent should not destroy evidence. All in-progress state, conversation history, and pending actions should be captured for forensic analysis.

Granular scope. Support killing a specific session, a specific agent instance, all agents for a specific user, or all agents globally. Different incidents require different blast radii.

Implementation Approaches

Policy engine flag. The simplest approach: the policy engine checks a "kill" flag before evaluating each action. When the flag is set, all actions are denied. This works when the policy engine sits in the critical path of every tool call.

# Global kill switch
global_state: "killed"
# Result: all tool calls receive immediate deny

Token revocation. Revoke the agent's API keys and authentication tokens. The next tool call fails at the authentication layer. This is effective but may have propagation delay if tokens are cached.

Network-level block. Firewall rules or proxy configuration that drops all traffic from the agent. Immediate effect, but blunt. Works as a last resort.

Queue drain. If the agent uses a message queue for tool calls, stop the consumer. Pending messages stay in the queue for later analysis.

Testing the Kill Switch

An untested kill switch is not a kill switch. Test quarterly:

Trigger the kill switch during active agent operation
Verify that all in-flight tool calls are stopped or their results are discarded
Verify that the agent cannot resume operation without explicit re-authorization
Verify that state is preserved for analysis
Measure the time from trigger to full stop

Post-Kill Procedures

After activating the kill switch:

Preserve all logs and receipts from the incident
Analyze what the agent was doing and why
Determine the root cause (prompt injection, policy gap, model error, data issue)
Fix the root cause before restarting
Consider whether to replay pending actions or discard them

The EU AI Act Connection

Article 14 of the EU AI Act requires human oversight mechanisms including the ability to intervene in or stop AI system operation. A tested kill switch is the most direct implementation of this requirement.

Build the kill switch before you deploy the agent. You will not have time to build it during an incident.