Shipping agents in production: a checklist

Most agent demos die the moment a real user touches them. Here is the short list we run through before any agent goes near production traffic.

1. Define the job, narrowly

Agents are mostly tools with a language model glued on top.

Each tool has a typed schema and a single responsibility.
Tools validate their own inputs and return structured errors, not stack traces.
Side-effectful tools (write, send, pay) require an explicit confirm: true arg.

Layer	What it catches
Input filter	Prompt injection, PII you don't want logged
Tool allowlist	Model trying to call something it shouldn't
Output filter	Leaked secrets, unsafe content
Spend cap	Runaway loops

logger.info("agent.step", {
  run_id, step, tool, latency_ms, tokens_in, tokens_out, cost_usd
});

If you can't answer "what did the agent do for user X at 02:14?" in under a minute, you are not ready.

Ship small. Watch closely. Iterate.