Engineering7 min read

Guardrails for production agents: approvals, limits, and fallbacks

Without guardrails, your agent will make decisions you didn't anticipate and can't control. Here's how to prevent that.

An AI agent running without guardrails is a bet. You're hoping it makes good decisions. In production, you can't afford to hope.

Guardrails are the rails that keep your agent on track. Approval gates. Decision limits. Fallback procedures. They're the difference between an agent you trust and an agent that surprises you at scale.

The guardrail types you need

Approval gates filter decisions that are too risky, too costly, or too important to let the agent decide alone. The question is: which decisions need approval?

Some decisions are always risky. Refunding a large order. Deleting user data. Changing payment information. These probably always need approval, no matter what your agent thinks.

Some decisions are risky above certain thresholds. Offering a refund? Maybe autonomous below $100, approval required above. Escalating a customer? Maybe autonomous for tier-1 issues, approval for tier-2.

The rest can run autonomously. The agent decides and acts. You monitor but don't require approval.

Decision limits constrain what the agent can do. How much money can it authorize? How many retries can it attempt? How many external API calls can it make per request?

Without limits, an agent might accidentally retry something 100 times, burning through budget and rate limits. Or it might authorize a discount that's too large. Or it might make so many API calls that it breaks your external integrations.

Limits are straightforward to implement and surprisingly powerful. A $10,000 authorization limit prevents most catastrophic decisions.

Fallback procedures define what happens when the agent fails. If the agent times out, does it return an error or queue for later? If an external API fails, does it retry or escalate? If the agent is uncertain, does it make a best guess or ask for human help?

Without fallback procedures, a single failure cascades. Without fallbacks, your agent breaks at the worst possible times.

How to design guardrails

Start with risk. What are the worst things your agent could do? What decisions, if wrong, would cause the most damage? Those decisions probably need guardrails.

A bad invoice processing decision costs money. A bad appointment booking decision wastes customer time. A bad data deletion decision is a data loss incident. These need constraints.

Next, look at your team's tolerance. How much autonomous authority does your agent get before it needs approval? Some teams are comfortable with agents making decisions autonomously. Others prefer more control. Your guardrails should reflect your risk tolerance.

Then think about what breaks your integrations. If you call an external API, what happens if it's slow or fails? You need fallbacks that don't break your agent or your user experience.

Approval gate patterns

Human-in-the-loop is the simplest pattern. Some decisions go to a human queue. A human reviews them. If approved, they execute. If rejected, they're discarded or escalated.

This is safe but slow. If you're human-approving too many decisions, your agent isn't saving anyone time.

Threshold-based approval is smarter. Decisions below a threshold run autonomously. Decisions above a threshold require approval. Example: agent books an appointment for a new customer (automatic), but any appointment with a special request requires approval.

Time-based gates let low-risk decisions run immediately but flag them for review afterward. Example: agent books an appointment and confirms it works, but a human reviews the transcript later to make sure it was appropriate.

Conditional approval lets different decisions have different approval rules. Example: refunds under $50 are automatic. Refunds between $50 and $500 need manager approval. Refunds over $500 need director approval.

Risk-based gates use the agent's own confidence. If the agent is very confident, decisions run autonomously. If confidence is low, they need approval. This lets you leverage the agent's own uncertainty.

Fallback examples

For external API calls: retry up to 3 times with exponential backoff. If still failing, escalate to a human. Don't fail silently.

For agent timeout: if the agent doesn't respond within 30 seconds, queue the request for manual processing. Come back to it later when the agent is responsive.

For agent uncertainty: if the agent's confidence is below 70%, escalate to a human instead of guessing. It's better to ask than to guess wrong.

For rate limits: track API call budget. If you're approaching the limit, queue requests instead of making them immediately.

For data-modifying operations: log every change the agent makes. If something goes wrong, you can audit and potentially revert.

Common guardrail mistakes

Setting approval thresholds too conservatively. If you require approval for 80% of decisions, you haven't automated anything. Be bold enough to let the agent work.

Not testing guardrail logic. Your guardrails can have bugs too. Test that approval gates actually fire when they should. Test that fallbacks actually handle failures.

Ignoring edge cases in your guardrails. What happens if someone tries to bypass the approval gate? What if an approval request times out? What if both the primary and fallback path fail?

Forgetting to monitor guardrails. Are approvals actually happening? Are fallbacks being triggered? Are your thresholds right? Monitor this and adjust based on what you learn.

Implementation notes

Build guardrails into your agent architecture from the start. Don't treat them as an afterthought. Your core agent logic should assume guardrails exist.

Make guardrails configurable. What gets approval, what the thresholds are, what the limits are. You'll adjust these as you learn from production.

Log guardrail decisions. When you escalate to a human, when you hit a limit, when you trigger a fallback. Log it all so you can audit and improve.

Test guardrails extensively before you deploy. Have scenarios that trigger each guardrail. Verify they work as expected.

The guardrail mindset

Here's the difference between an agent that works and an agent that surprises you: guardrails.

Your agent will make mistakes. Your goal isn't to prevent all mistakes. It's to catch the dangerous ones before they happen, and to have clear procedures for when something breaks.

Guardrails let you deploy an agent with confidence. You're not hoping it works. You're designing for the ways it might fail and handling them.

Before you put an agent into production, think through guardrails. What needs approval? What limits do you need? What fallbacks? Get these right and your agent will be reliable.