Running MCP servers in production.

You have a working MCP server. The gap between that and something an agent can hit in production is mostly about trust boundaries: who is calling, what they're allowed to do, and what happens when a write runs twice. Here's what closes that gap.

The intro server is a prototype.

The first MCP server you build wires a few tools to a transport and calls it done. It works because there's one caller, one set of credentials baked into the process, and no concurrency to speak of.

Production breaks every one of those assumptions. Multiple agents call the same server. Some tools write. Sessions outlive a single tool change. And the moment the server is reachable, its authorization model is the only thing standing between an agent and your data.

Everything below is the work between 'it returns a result' and 'I'd let an autonomous agent call it against real systems.'

OAuth and least-privilege scopes per tool.

A token that can call your server should not automatically be allowed to call every tool on it. Treat each tool as a capability with its own scope, and check the scope at the call boundary — not in a comment, not in the agent's prompt.

MCP's authorization spec builds on OAuth 2.1: the server acts as a resource server that validates an access token and the scopes it carries. Map scopes to tools narrowly. A read tool gets `records:read`; the tool that issues refunds gets `refunds:write` and nothing borrows it.

The wrapper is boring on purpose. Boring is auditable.

function withScope<T>(scope: string, handler: ToolHandler<T>): ToolHandler<T> {
  return async (args, ctx) => {
    if (!ctx.auth?.scopes.includes(scope)) {
      throw new McpError(ErrorCode.InvalidRequest, `missing scope: ${scope}`);
    }
    return handler(args, ctx);
  };
}

server.registerTool(
  "issue_refund",
  { description: "Refund a charge", inputSchema: refundSchema },
  withScope("refunds:write", issueRefund),
);

Keep secrets out of the model context.

The model never needs your database password or your payment provider key. It needs the result of an action that used them. Keep the credential server-side, resolve it at call time, and return only what the agent must reason about.

The failure mode is subtle: a tool that echoes a full upstream API response can leak an internal token, a signed URL, or a customer's PII straight into the transcript — where it gets logged, cached, and possibly retained. Filter the response shape deliberately. Return the fields the agent uses, drop the rest.

Same rule for errors. 'Upstream auth failed' is a fine message. The actual bearer token in the stack trace is not.

Rate limits and idempotency on anything that writes.

Agents retry. They retry on timeouts, on ambiguous responses, on their own planning loops. A read tool firing twice is wasted compute. A write tool firing twice is a double charge.

Require an idempotency key on every write tool and dedupe on it. The key can come from the caller or be derived from the arguments, but the guarantee is the same: the second identical call returns the first call's result instead of doing the work again. The sketch below covers sequential retries; for concurrent duplicates you also need to record the key before the work starts, so the second call blocks on the first instead of racing it.

Pair that with per-caller rate limits so one looping agent can't exhaust an upstream quota for everyone.

async function withIdempotency<T>(key: string, work: () => Promise<T>): Promise<T> {
  const cached = await store.get(key);
  if (cached) return cached.result as T;

  const result = await work();
  await store.set(key, { result }, { ttlSeconds: 86_400 });
  return result;
}

Versioning so a tool change doesn't break a live session.

An agent reads your tool schema at the start of a session and plans against it. If you rename an argument or tighten a type mid-session, calls that were valid a minute ago start failing — and the agent has no way to know why.

Treat tool schemas like an API contract. Additive changes (a new optional argument, a new tool) are safe. Breaking changes (renamed or removed arguments, narrowed types, changed semantics) need a new tool name or an explicit version suffix, with the old one kept alive through a deprecation window.

MCP servers can signal a tool-list change to connected clients, but a client mid-task may not re-fetch. So announce removals out of band and assume some agents are still planning against the old contract until their next session.

Structured traces on every call.

When an agent does something wrong through your server, you need to reconstruct exactly what it called and what came back. Free-text logs won't cut it.

Emit a structured record per call: tool name, schema version, caller identity, scope checked, idempotency key, latency, outcome, and a redacted argument summary. That's enough to answer 'which agent issued that refund and why' without replaying the whole session.

Carry trace IDs across the boundary into your upstream calls so a single agent action is followable end to end. This is also where you catch an agent stuck in a retry loop before it shows up on a bill.

Portability is the double edge.

MCP's real leverage is that a capability you expose works for any MCP client — Claude today, a different agent tomorrow, an internal orchestrator after that. You write the tool once and every agent can use it.

That portability is also the risk. The same standardization that lets a trusted agent call your refund tool lets any client that obtains a valid token call it. There is no 'this tool is only for my agent' — the boundary is the token and the scopes, full stop.

So design as if an unfamiliar client will call every tool, because eventually one will. Least privilege, idempotency, and tracing aren't hardening you add later; on a portable interface they're the interface.

On a portable interface, the authorization model isn't a layer around the tool — it is the tool's contract.

The production checklist.

Before an MCP server faces real agents and real data: every tool gated by an OAuth scope it actually needs; no secret or raw upstream payload reachable from the model context; every write tool idempotent and rate-limited per caller; schemas versioned with a deprecation path for breaking changes; a structured trace on every call with IDs that reach upstream.

None of this is exotic. It's the same discipline any write-capable API earns before it ships. MCP just makes skipping it more expensive, because the thing on the other end is an autonomous loop that will find your missing idempotency guard faster than a human ever would.

Common questions

Do I need OAuth for an internal MCP server only my own agents call?

If it can write or touch sensitive data, yes. 'Internal' is a network assumption, not an authorization model — and MCP's portability means a client you didn't plan for can call any tool that holds a valid token. Scopes per tool give you a real boundary instead of a hopeful one. For a read-only server on a locked-down network the bar is lower, but you still want per-caller rate limits and tracing.

Where should idempotency keys come from — the agent or the server?

Prefer caller-supplied keys for write tools, because the agent knows when two calls represent the same intended action versus two genuinely separate ones. Fall back to deriving the key from the arguments when the caller doesn't send one. Either way, dedupe server-side with a TTL long enough to cover an agent's retry window — a day is a reasonable default.

How do I change a tool's schema without breaking agents already running?

Make additive changes freely — new optional arguments and new tools don't break existing plans. For anything breaking (renaming or removing an argument, narrowing a type, changing what the tool does), introduce a new tool name or version suffix and keep the old one through a deprecation window. An agent that re-reads the tool list at session start will pick up the change; one mid-task may still be planning against the old contract, so never pull it out from under it.