AI Sec Reviews
Isometric security layer intercepting a prompt injection attempt before reaching an LLM
reviews

Lakera Guard: Prompt Injection Detection in Practice

Lakera Guard is purpose-built for prompt injection detection rather than general content moderation. A documentation- and feature-based look at what it

By AI Sec Reviews Editorial · · 8 min read

Lakera Guard is one of the few AI security products designed specifically for prompt injection detection rather than general content moderation. The positioning matters: prompt injection is a distinct threat class from toxicity or harmful content generation, and it requires different detection approaches.

This is a documentation- and feature-based evaluation: what Lakera Guard claims to cover, how it integrates, how it’s priced, and how it sits relative to alternatives. Claims about capabilities and detection rates below are drawn from Lakera’s own published material and third-party sources, attributed where they appear. It is structured around our AI security tool evaluation framework. Treat vendor-stated detection figures as marketing benchmarks rather than independently reproduced numbers, and validate them against your own traffic and threat model before relying on them.

What Lakera Guard does

Per Lakera’s API documentation, Lakera Guard exposes a real-time API (“Prompt Defense” guardrail) that classifies LLM inputs and reference content for:

  • Direct prompt injection (user inputs attempting to override system instructions)
  • Indirect prompt injection (injected content in retrieved documents, tool outputs, or other external content)
  • Jailbreak attempts and system prompt extraction
  • PII detection, data-leakage prevention, and content-violation detection (secondary capabilities)

The classification is delivered as an API call, returns a flag plus category details, and is designed to drop into the pre-processing pipeline before your LLM call. Lakera states the guardrail scans both user inputs and retrieved/reference documents, and that it is compatible with any LLM provider (OpenAI, Anthropic, Google, Azure OpenAI, AWS Bedrock, or self-hosted models). Coverage is advertised across 100+ languages.

Detection coverage (as published)

Direct prompt injection and jailbreaks: This is the core use case. Lakera markets Guard as a purpose-built detector for the classic override patterns (“ignore previous instructions,” role-override, persona-style jailbreaks), and its public materials cite a 98%+ detection rate. As with every detection tool in this category, published rates are measured against the vendor’s own evaluation set; adversarially optimized bypasses, encoding-based obfuscation, and novel framing remain the hardest cases industry-wide, and no published rate should be read as a guarantee against a determined attacker.

Indirect prompt injection: This is where Lakera’s differentiation is most visible. Many competitors either lack indirect-injection detection or undertrain it. Lakera’s documented approach — classifying retrieved documents and tool outputs for embedded instructions, not just the user turn — is a meaningful feature for retrieval-augmented and agentic applications, where the attack often arrives inside content the model is asked to summarize or act on rather than from the user directly.

PII and data leakage: Lakera has stated that its enhanced PII detection and DLP capabilities are available to both SaaS and self-hosted customers, covering PII detection/redaction, secrets detection, and custom data-pattern matching on inputs and outputs. This is a secondary capability rather than the reason to choose the product.

Latency and integration model

Lakera advertises sub-50ms runtime latency and states the guardrail “deploys in minutes” with no changes to your models or prompts. At that order of magnitude the API adds measurable but generally tolerable overhead for most user-facing interactions; for hard sub-100ms budgets it is a real constraint you should benchmark in your own region and traffic profile. The self-hosted option keeps inference inside your environment, which is the relevant lever when network round-trips to an external service are the latency or compliance concern.

Cost model

Lakera Guard uses a tiered, request-based pricing model:

  • Developer (free): on the order of 10,000 API calls/month with core models and community support.
  • Pro (contact for pricing): higher volume, advanced detection models, and dedicated email support.
  • Enterprise (custom): on-premise or private-cloud hosting, custom-trained models, audit logging, SIEM integration, SLAs, and high-volume rates.

Because pricing scales with request volume, cost tracks traffic directly. The self-hosted and enterprise tiers are priced separately and are aimed at compliance environments where calls to external services are restricted. Confirm current pricing with Lakera, as published tiers and limits change.

Comparison to alternatives

Rebuff (open source): Lower cost (self-hosted), weaker indirect-injection coverage, reasonable direct-injection detection. A good starting point if cost is the constraint, but it is closer to a research project than a maintained production tool.

LangKit/WhyLabs: More oriented toward general LLM monitoring and observability than injection detection specifically. Better fit for the observability use case than the security use case.

OpenAI Moderation API: Fast and free, but not designed for injection detection. Catches some obvious jailbreaks incidentally; misses most injection-specific patterns. Reasonable as a $0 baseline rail if you are already on OpenAI.

NeMo Guardrails: More capability but more complexity, and the only option here offering multi-turn conversation flow control (via Colang). If you need dialog-flow control in addition to injection detection, the overhead can be worth it; if you only need injection detection, Lakera is more targeted.

Verdict

On documentation and feature coverage, Lakera Guard is among the most clearly purpose-built prompt injection detection tools available, and its indirect-injection handling is a genuine differentiator for RAG and agentic systems. For teams whose primary threat model is injection-based attacks — common for consumer-facing applications — it is a strong candidate for the dedicated detection layer.

It is a layer in a stack, not a complete security solution. It does not replace content classification (use Llama Guard or OpenAI Moderation API for that), output monitoring, or behavioral anomaly detection. Before committing, benchmark detection and latency against your own traffic rather than relying on vendor-stated figures.

For broader AI security product comparisons across the stack, bestaisecuritytools.com maintains updated benchmark data.

Sources

  1. Lakera Guard product page
  2. Lakera API documentation (Prompt Defense)
  3. OWASP LLM Top 10
Subscribe

AI Sec Reviews — in your inbox

Reviews of AI security products and platforms. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments