15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Prompt Injection Defense Comparison: Aegis, NeMo Guardrails, Lakera, and Custom Solutions

December 6, 202515 Research Lab

prompt-injectiondefenseguardrailstools

The market for prompt injection defense tools has matured past the "roll your own regex" phase. Several options exist, each with different architectural approaches and tradeoffs.

Aegis (15 Research Lab / Authensor)

Approach: Multi-layer scanning pipeline. Pattern matching first, then statistical analysis, then composite scoring. Zero runtime dependencies.

Strengths: Runs entirely in-process with no network calls. Sub-millisecond latency for pattern scanning. Configurable sensitivity thresholds. Open source under MIT license. Scans tool descriptions and responses, not just user input.

Tradeoffs: No ML classifier in the core scanner (by design, to maintain zero dependencies). Relies on pattern breadth and statistical signals rather than learned representations.

Best for: Applications that need deterministic, auditable scanning with no external service dependencies.

NVIDIA NeMo Guardrails

Approach: Colang-based dialog management with programmable guardrails. Uses a domain-specific language to define conversation flows and safety rules.

Strengths: Flexible dialog management beyond just injection detection. Can enforce conversation structure, topic boundaries, and output formatting. Active development backed by NVIDIA.

Tradeoffs: Steeper learning curve due to Colang DSL. Adds latency from the dialog management layer. More focused on conversation control than attack detection specifically.

Best for: Applications that need structured conversation management alongside safety, especially those already in the NVIDIA ecosystem.

Lakera Guard

Approach: Hosted API service with ML-based injection detection. Send text, get a risk score.

Strengths: Simple API integration. ML classifier trained on large injection datasets. Continuously updated models. Low integration effort.

Tradeoffs: Requires network calls to external service (latency and availability dependency). Proprietary classifier is a black box. Per-request pricing. Your prompts are sent to a third party.

Best for: Teams that want fast integration without building detection infrastructure, and are comfortable with a hosted dependency.

Custom Solutions

Approach: Build your own using open-source classifiers (DeBERTa fine-tuned on injection datasets), regex patterns, and application-specific logic.

Strengths: Full control. Can be optimized for your specific attack surface. No external dependencies or costs.

Tradeoffs: Significant engineering investment. Requires ongoing maintenance as attack techniques evolve. No community-maintained pattern updates.

Best for: Organizations with dedicated ML security teams who need maximum control and customization.

Making the Choice

The right choice depends on your constraints:

Latency-sensitive, no external calls: Aegis
Structured dialog control: NeMo Guardrails
Fastest integration, SaaS acceptable: Lakera Guard
Maximum customization, dedicated team: Custom build

Most production systems benefit from combining approaches: a fast in-process scanner for the first pass, with a secondary classifier for flagged inputs.