15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Building AI Safety Policies in YAML: A Practical Guide

January 2, 202615 Research Lab

guardrailsdefensetools

Writing AI safety policies in YAML turns abstract safety requirements into concrete, version-controlled, testable rules. Here is how to structure policies for AI agent authorization.

Basic Policy Structure

A policy file defines rules that the policy engine evaluates against every tool call:

version: "1.0"
default_action: deny  # fail-closed

policies:
  - name: "read-operations"
    description: "Allow read-only database queries"
    tools: ["query_database", "search_records"]
    roles: ["analyst", "admin"]
    action: allow
    constraints:
      operation: "SELECT"

The default_action: deny line is critical. Any tool call that does not match an explicit allow rule is denied. This is the fail-closed principle in configuration.

Tool-Level Rules

Define which tools each role can access:

policies:
  - name: "file-read"
    tools: ["read_file"]
    roles: ["analyst", "admin"]
    action: allow

  - name: "file-write"
    tools: ["write_file"]
    roles: ["admin"]
    action: allow
    require_approval: true

Analysts can read files. Only admins can write files, and writes require human approval.

Parameter Constraints

Tool-level access is coarse. Parameter constraints add precision:

policies:
  - name: "database-query"
    tools: ["query_database"]
    roles: ["analyst"]
    action: allow
    constraints:
      tables:
        allow: ["public.orders", "public.products"]
        deny: ["internal.users", "internal.credentials"]
      operation: "SELECT"
      max_rows: 1000

The analyst can query orders and products but not the users or credentials tables. Queries are limited to SELECT operations returning at most 1000 rows.

Rate Limits

Prevent runaway agents with rate limits:

policies:
  - name: "api-calls"
    tools: ["call_external_api"]
    roles: ["agent"]
    action: allow
    rate_limit:
      max_calls: 50
      window: "1h"
    budget:
      max_cost: 10.00
      currency: "USD"
      window: "24h"

The agent can make at most 50 API calls per hour and spend at most $10 in 24 hours.

Approval Triggers

Define conditions that require human review:

policies:
  - name: "data-export"
    tools: ["export_data"]
    roles: ["analyst", "admin"]
    action: allow
    require_approval:
      when:
        - row_count: { gt: 10000 }
        - format: "csv"
        - destination: { not_in: ["internal-s3-bucket"] }

Exports need approval when they involve more than 10,000 rows, export to CSV, or target an unfamiliar destination.

Environment-Specific Policies

Use separate policy files for different environments:

# production.yaml
default_action: deny
policies:
  - name: "prod-read"
    tools: ["query_database"]
    action: allow
    constraints:
      operation: "SELECT"

# development.yaml
default_action: allow  # More permissive for dev
policies:
  - name: "dev-deny-destructive"
    tools: ["drop_table", "delete_all"]
    action: deny

Production is deny-by-default. Development is allow-by-default with specific dangerous operations blocked.

Testing Policies

Treat policy files like code. Write tests that verify:

Expected tool calls are allowed
Unexpected tool calls are denied
Parameter constraints are enforced
Rate limits trigger at the correct thresholds
Approval rules fire for the right conditions

Run these tests in CI/CD. Policy regressions (accidentally allowing a previously denied action) should be caught before deployment.

Version control your policies. Code review changes. Maintain a changelog. These are security-critical configuration files.