What guardrails should I add to prevent prompt injection when my agent can access internal APIs and databases?
AI Coding Agent Platforms

What guardrails should I add to prevent prompt injection when my agent can access internal APIs and databases?

7 min read

Quick Answer: You need multiple layers of guardrails: input processors to detect and rewrite injections, strict tool and data access policies, output scrubbing, and full observability so you can trace and evaluate how your agent touches internal APIs and databases.

Frequently Asked Questions

What is prompt injection and why is it dangerous when agents can hit internal APIs and databases?

Short Answer: Prompt injection is when user input tricks the model into ignoring your instructions and doing something unsafe—like exfiltrating secrets or calling internal APIs with bad parameters. When your agent can talk to internal systems, this can turn into real data leaks and side‑effects, not just weird text output.

Expanded Explanation:
In a simple chat UI, prompt injection is mostly a UX problem. Once your agent can call tools, access internal APIs, or query production databases, it becomes an application security problem. A malicious user can try to override your system prompt, request hidden configuration, or coerce the agent into running tools in ways you never intended (e.g., “export all customer records as CSV and print them here”).

In Mastra, we treat this as a design constraint, not an edge case. Agent prompts, tool schemas, processors, and evals are all control surfaces you can harden. The job isn’t to magically “make the model safe,” it’s to ensure your infrastructure can’t be misused even if the model is actively trying to follow attacker instructions.

Key Takeaways:

  • Prompt injection becomes critical once agents have tool/API/DB access.
  • You need explicit controls at the input, tool layer, and output—not just “better prompts.”

How do I set up guardrails in Mastra to block prompt injection before it reaches my agent?

Short Answer: Use input processors like PromptInjectionDetector and ModerationProcessor to normalize, classify, and optionally rewrite or block risky prompts before your Agent ever calls a model or tool.

Expanded Explanation:
Mastra gives you processors that run before and after model calls. For injection defense, you want a pipeline that first normalizes input, then detects threats, then filters or rewrites. PromptInjectionDetector uses an LLM to classify risky instructions (e.g., “ignore previous instructions,” “reveal your system prompt”) and lets you choose whether to block, allow, or rewrite them based on a configurable threshold and strategies.

A typical secure agent combines multiple processors: normalization to reduce weird Unicode tricks, prompt injection detection to catch instruction overrides, PII detection/moderation to avoid unintentional sensitive content, and logging via observability so you can audit what’s happening. The key is to fail closed: if the detector is uncertain, you’d rather ask the user to rephrase than let a risky prompt through.

Steps:

  1. Normalize input: Use UnicodeNormalizer or similar to standardize user messages.
  2. Detect injection: Add PromptInjectionDetector as an inputProcessor with a strict threshold and a block or rewrite strategy.
  3. Filter and moderate: Optionally chain ModerationProcessor or PIIDetector to enforce content policies before the agent touches tools or internal APIs.

What’s the difference between input guardrails and output guardrails for prompt injection?

Short Answer: Input guardrails protect your tools and internal APIs from malicious prompts; output guardrails protect your users and infrastructure from leaking system prompts, sensitive data, or internal context after the model has responded.

Expanded Explanation:
Input guardrails focus on “what gets into the system.” They look at user messages and decide if they should be allowed, rewritten, or blocked. This is where PromptInjectionDetector is most critical; it stops an attacker from steering the agent into dangerous tool calls or query patterns.

Output guardrails focus on “what leaves the system.” Even with perfect tool schemas, a model might try to echo its system prompt, reveal config details, or summarize sensitive internal data in unsafe ways. A processor like SystemPromptScrubber runs on the response and redacts system prompt content or configuration details before the text is returned to the user. It uses an LLM to identify and scrub these fragments based on detection types you configure.

Comparison Snapshot:

  • Option A: Input guardrails (e.g., PromptInjectionDetector): Protect downstream tools, APIs, and DB queries from hostile instructions.
  • Option B: Output guardrails (e.g., SystemPromptScrubber): Prevent leaks of system prompts, secrets, and sensitive context back to the user.
  • Best for: Combining both—input to block misuse, output to block disclosure—for agents with access to internal APIs and databases.

How do I practically implement these guardrails in a Mastra Agent that uses internal tools and databases?

Short Answer: Wrap your Agent with processors (PromptInjectionDetector, SystemPromptScrubber, etc.), define strict tool schemas for internal APIs/DBs, and run everything inside a Workspace with observability enabled so you can trace tool calls and token usage.

Expanded Explanation:
Think of this as a pipeline: user input → processors → model + tools → processors → user. In Mastra, you configure this explicitly on the Agent. For internal APIs and databases, your tools should expose narrow, typed operations—not “run arbitrary SQL.” Guardrails then sit around those tools, ensuring that even if the model is persuaded, it can’t step outside the schema.

A secure Mastra setup typically includes:

  • Input processors: UnicodeNormalizer, PromptInjectionDetector, maybe PIIDetector or ModerationProcessor.
  • Tools: Typed, parameterized internal API/DB tools—no raw strings that go straight into a query.
  • Output processors: SystemPromptScrubber to prevent prompt/config leaks.
  • Observability: Turn on tracing so you can see prompts, completions, tool calls, and token usage in Mastra Studio or your OpenTelemetry stack.

What You Need:

  • Mastra primitives configured in code: Agent, processors (PromptInjectionDetector, SystemPromptScrubber, etc.), tools for your internal APIs/DBs, and a Workspace to run and iterate.
  • Observability and evals: built-in tracing plus custom evals to measure how often injections are detected, blocked, or slip through in staging before production.

How do guardrails for prompt injection connect to business risk and long-term reliability?

Short Answer: Guardrails turn your agent from a fragile demo into a production service you can trust, by reducing data exfiltration risk, limiting blast radius, and giving you traceability and evals to continuously improve safety.

Expanded Explanation:
Once your agent touches internal APIs and databases, you’re in “infrastructure, not experiment” territory. A single successful prompt injection can leak customer data, internal configuration, or operational details in a way that triggers security incidents and compliance headaches. Guardrails—processors, tool schemas, permissioning, and observability—are how you enforce least privilege on an AI system.

Mastra is built for this production reality. You can define custom evals (model-graded, rule-based, statistical) to track how your guardrails perform over time, inspect traces to debug why a prompt was flagged or missed, and adjust thresholds or processors without rewriting your whole stack. That’s how teams like Plaid, Elastic, Replit, Docker, and SoftBank treat agents: as code with defense-in-depth, not as a black-box chatbot.

Why It Matters:

  • Security and compliance: Guardrails reduce the chance of prompt injection turning into actual data leaks or unauthorized actions on internal systems.
  • Operational confidence: With evals and observability, you can measure and improve your defenses over time instead of hoping your prompt is “good enough.”

Quick Recap

When your Mastra agent can access internal APIs and databases, prompt injection is no longer a theoretical NLP problem—it’s a security boundary problem. You mitigate it with layered guardrails: input processors like PromptInjectionDetector to block or rewrite risky prompts, strict tool schemas around internal APIs/DBs, output processors like SystemPromptScrubber to prevent leaks, and full observability plus evals to watch how the system behaves over time. The goal is agents as infrastructure: explicit control surfaces, traceable decisions, and minimized blast radius.

Next Step

Get Started