/blog
AgentsEng6 min

Shipping agents in production: a checklist

A short, opinionated checklist for taking an LLM agent from a flashy demo to something you can actually keep on call for.

May 30, 2026

Most agent demos die the moment a real user touches them. Here is the short list we run through before any agent goes near production traffic.

1. Define the job, narrowly

  • Write the agent's job description in one sentence.
  • Write three concrete tasks it must do, and three it must refuse.
  • If you cannot, the scope is too big — split it.

2. Tools before prompts

Agents are mostly tools with a language model glued on top.

  • Each tool has a typed schema and a single responsibility.
  • Tools validate their own inputs and return structured errors, not stack traces.
  • Side-effectful tools (write, send, pay) require an explicit confirm: true arg.

3. Guardrails you can point at

LayerWhat it catches
Input filterPrompt injection, PII you don't want logged
Tool allowlistModel trying to call something it shouldn't
Output filterLeaked secrets, unsafe content
Spend capRunaway loops

4. Observability that survives 3am

logger.info("agent.step", {
  run_id, step, tool, latency_ms, tokens_in, tokens_out, cost_usd
});

If you can't answer "what did the agent do for user X at 02:14?" in under a minute, you are not ready.

5. Evals before launch, evals after launch

  • A frozen offline eval set that runs on every PR.
  • A sampled online eval that grades real production runs daily.
  • A rollback plan if either drops.

Ship small. Watch closely. Iterate.