15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

AI Agent Attack Surface Enumeration

December 30, 202515 Research Lab

toolsred-teamagent-safety

You cannot defend what you have not mapped. Before building defenses, enumerate the full attack surface of your AI agent deployment.

Input Paths

Every way that data reaches the model is an input path and a potential injection vector:

User text input: The primary input. Direct prompt injection target.
File uploads: Documents, images, CSVs processed by the agent. Indirect injection vector.
Web content: URLs the agent fetches and processes. Attacker-controlled content.
Email content: If the agent reads emails, every email is potential injection.
Database records: Query results that enter the model's context. Can be poisoned by anyone with database write access.
API responses: Third-party API responses included in context. Attacker controlled if the API is compromised.
MCP tool descriptions: Tool definitions from MCP servers. Trusted channel, high injection impact.
MCP tool responses: Tool output that re-enters the model context. Injection vector post-tool-call.
Conversation history: Previous turns that shape model behavior. Manipulable in multi-turn attacks.
System prompt: If dynamically generated, the generation pipeline is an attack surface.

Tool Connections

Every tool the agent can call is both a capability and a risk:

What tools are available? List every tool with its capability category (read-only, write, execute, network).
What parameters do they accept? Unrestricted parameters expand the attack surface.
What can they access? Database tables, file paths, network endpoints, external services.
What data do they return? Sensitive data in tool responses enters the model's context.
How are they authenticated? Shared credentials, per-tool tokens, no authentication.

Data Sources

What data can the agent access, and who else can modify that data?

Databases: Which tables, with what access level? Who else writes to those tables?
File systems: Which directories? Are any world-writable?
Knowledge bases: Who contributes content? Is it reviewed before indexing?
External APIs: Are responses validated? Can API responses be manipulated?

Communication Channels

How does the agent communicate with the outside world?

User responses: Direct text output.
Tool calls: Outbound API requests, database queries, file operations.
Notifications: Email, Slack, webhook notifications the agent can send.
Logs: Logged data that is accessible to other systems.
Inter-agent messages: In multi-agent systems, messages to other agents.

The Attack Surface Mapper

The Attack Surface Mapper automates parts of this enumeration. It connects to MCP servers, inventories tools, analyzes descriptions and schemas, and produces a report covering:

Tool capability classification
Authentication and transport security status
Tool description injection risk scoring
Parameter validation coverage
SARIF output for integration with security dashboards

Manual enumeration is still needed for non-MCP input paths (file uploads, web content, database records). Automated scanning covers the MCP-specific surface.

Using the Enumeration

Once you have the full map, prioritize defenses:

High-capability tools (write, execute, network) get the strictest policies
Untrusted input paths (user input, web content, uploads) get the strongest scanning
Sensitive data sources get access restrictions and output scanning
Communication channels get egress controls

Review the enumeration quarterly. Tool additions, data source changes, and new integrations change the attack surface.