15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Behavioral Fingerprinting for AI Systems

February 6, 202615 Research Lab

defensemethodologyagent-safety

Static rules catch known threats. Behavioral fingerprinting catches unknown threats by building a statistical profile of what "normal" looks like for each agent and flagging deviations.

What a Behavioral Fingerprint Captures

A fingerprint is a statistical summary of an agent's behavior over time:

Tool-call distribution. Which tools does the agent use, and how often? An agent that calls "search" 60% of the time, "read_file" 30%, and "query_database" 10% has a characteristic distribution. If it suddenly calls "send_email" (which it has never used), the deviation is detectable.

Parameter statistics. What parameter values does the agent typically use? Average query length, common file paths, typical API endpoints. Unusual parameter values (a very long string, an unfamiliar URL, a table the agent has never queried) stand out against the profile.

Temporal patterns. When does the agent act? How fast are its tool calls? What is the typical time between actions? An agent that usually processes 5 requests per minute and suddenly processes 50 has changed its temporal profile.

Session structure. How long are typical sessions? How many turns? How many tool calls per session? What is the typical sequence of tools used? Changes in session structure can indicate manipulation or malfunction.

Error patterns. How often do tool calls fail? What types of errors occur? An agent that suddenly starts generating authorization errors may be probing for access.

Building the Fingerprint

Collect behavioral data during a baseline period (the first week of production operation, or a dedicated baseline collection phase).

For each metric, compute:

Mean and standard deviation
Distribution shape (is it normal, skewed, bimodal?)
Temporal correlations (is there a daily or weekly pattern?)

Store the fingerprint as a compact statistical summary. Update it incrementally as new data arrives, with exponential decay so the fingerprint adapts to legitimate behavioral changes over time.

Anomaly Detection

Compare each new observation to the fingerprint. Flag observations that fall outside expected ranges:

Tool calls to tools not in the fingerprint: immediate flag
Parameter values more than 3 standard deviations from the mean: flag
Temporal patterns that differ significantly from the profile: flag
Session structures that deviate from the norm: flag

Use Sentinel's EWMA and CUSUM algorithms to track cumulative deviations. Individual anomalies may be noise. Sustained deviations across multiple metrics indicate a real behavioral change.

Applications

Compromised agent detection. A prompt-injected agent behaves differently from its baseline. The fingerprint detects the behavioral shift even if the specific attack is novel.

Impersonation detection. In multi-agent systems, if Agent B starts behaving like Agent A (same tool-call distribution, same parameter patterns), it may be impersonating Agent A.

Drift detection. Over time, agent behavior may drift from its intended operating parameters. Fingerprint comparison between current behavior and the original baseline quantifies this drift.

Forensic analysis. During incident investigation, the fingerprint shows exactly when behavior changed. This narrows the investigation window to the period of deviation.

Fingerprinting does not replace policies or scanning. It adds a layer that detects anomalies that rules and patterns were never written for.