15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

AI Agent Behavioral Monitoring: EWMA, CUSUM, and Drift Detection

January 12, 202615 Research Lab

agent-safetydefensemethodologytools

Policy engines catch known-bad actions. Behavioral monitoring catches unknown-bad patterns. When an agent starts acting differently from its established baseline, statistical algorithms flag the deviation before it causes harm.

Why Static Policies Are Not Enough

A policy engine might allow the "search_database" tool with read-only access. But it cannot easily express: "this agent normally makes 3-5 database queries per session, and today it has made 47." The individual calls are all permitted. The pattern is anomalous.

Behavioral monitoring fills this gap by tracking statistical properties of agent behavior over time and detecting when those properties change.

EWMA: Exponentially Weighted Moving Average

EWMA maintains a smoothed average of a metric (e.g., tool calls per minute) that weights recent observations more heavily than old ones. When the current value deviates significantly from the smoothed average, an alert fires.

The smoothing parameter (lambda) controls sensitivity. A smaller lambda means more smoothing and slower response to changes. A larger lambda means less smoothing and faster detection.

Good for: Detecting gradual changes in behavior, such as slowly increasing tool-call frequency. Less sensitive to one-off spikes.

CUSUM: Cumulative Sum

CUSUM accumulates the difference between observed values and a target. When the cumulative sum exceeds a threshold, it signals a shift. CUSUM is more sensitive to sustained changes than EWMA.

Good for: Detecting persistent shifts in behavior, such as an agent that starts accessing a new set of tools consistently. The cumulative nature means even small persistent deviations eventually trigger the alert.

Drift Detection Method (DDM)

DDM monitors the error rate of a process (or more generally, any binary signal) and detects when the rate increases significantly. It uses the binomial distribution to set warning and drift thresholds.

Good for: Detecting changes in categorical behavior, such as an increase in denied tool calls (which might indicate the agent is probing for access) or a change in the types of tools being called.

What to Monitor

Track these metrics per agent session:

Tool-call frequency (calls per minute)
Tool diversity (number of distinct tools used)
Parameter distributions (are parameters within historical ranges?)
Error rates (how often are tool calls failing?)
Session duration (is the agent running longer than expected?)
Escalation patterns (is the agent requesting increasingly sensitive operations?)

Implementation

Sentinel implements EWMA and CUSUM with zero dependencies. It runs as a lightweight in-process monitor that receives events from the policy engine and maintains per-agent statistical profiles.

When a threshold is exceeded, Sentinel emits an alert that can trigger:

Logging for later review
Increased scrutiny (lower policy thresholds) for the remainder of the session
Human notification for manual investigation
Session termination (kill switch activation)

The monitoring layer does not make deny/allow decisions directly. It provides a risk signal that feeds into the policy engine's evaluation context. This separation keeps the policy engine deterministic and the monitoring system statistical.