When MCP is the right shape.
MCP is plumbing for "give an LLM access to this capability." So is a tool call inside an AI SDK route. So is a CLI the agent shells out to. They are not interchangeable.
| Choose | When | Example |
|---|---|---|
| MCP server | The capability is consumed by multiple clients (your IDE agent, your chat product, a Slack bot). | Internal "company search" surfaced to Cursor, Claude Code, and a customer support agent. |
| Tool call in a route | The capability is consumed only by one chat endpoint inside one app. | listMyOrders on your AI SDK chat route. |
| CLI / shell | You already have a CLI; the agent can just run it. | gh pr view, terraform plan, psql -c. |
| HTTP API the agent calls directly | The agent has internet access and you have a public, well-documented API. | Public REST APIs the agent already knows how to call. |
The MCP test is "would I expose this to more than one client?" If yes, build the server. If no, save yourself a deployment and inline it as a tool call.
Transport: stdio or Streamable HTTP.
| Concern | stdio | Streamable HTTP |
|---|---|---|
| Where it runs | Local, spawned by the client | Hosted, anything connects |
| Users | Single user (the one running the client) | Multi-user |
| Auth | Inherits OS env | OAuth 2.1 / bearer / mTLS |
| Discovery | JSON config in the client | URL |
| Distribution | npm package + a config snippet | Public URL + auth instructions |
| Use when | Local file system, dev databases, IDE-only tools | Internal services, hosted SaaS MCPs, anything multi-tenant |
The older HTTP+SSE transport was deprecated in the spec's 2025-03-26 revision. It still works for backward compatibility, but new servers should use Streamable HTTP — single endpoint, works behind load balancers, simpler operationally. If a tutorial wires new SSEServerTransport(...), it is out of date.
A stdio server in 40 lines.
The local case. A Node.js process the client launches; you teach it a tool, you're done.
{
"name": "acme-time-mcp",
"type": "module",
"bin": { "acme-time-mcp": "./dist/index.js" },
"dependencies": {
"@modelcontextprotocol/sdk": "^1.0.0",
"zod": "^3.23.0"
}
}#!/usr/bin/env node import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; import { z } from 'zod'; const server = new McpServer({ name: 'acme-time', version: '1.0.0' }); server.registerTool( 'currentTime', { title: 'Current time', description: 'Returns the current time in a given IANA timezone.', inputSchema: { zone: z.string().describe('e.g. America/New_York') } }, async ({ zone }) => { const now = new Date().toLocaleString('en-US', { timeZone: zone }); return { content: [{ type: 'text', text: now }] }; } ); const transport = new StdioServerTransport(); await server.connect(transport);
{
"mcpServers": {
"acme-time": {
"command": "npx",
"args": ["-y", "acme-time-mcp"]
}
}
}That's the whole loop: register a tool with a zod input schema, hand the server a stdio transport, publish to npm, paste a four-line config into the client. Three to four hours from idea to "the agent can call this in your IDE."
A Streamable HTTP server.
The hosted case. Mount the transport on a single /mcp endpoint, deploy it like any other Node service.
import express from 'express'; import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js'; import { z } from 'zod'; function build() { const server = new McpServer({ name: 'acme-search', version: '1.0.0' }); server.registerTool( 'searchDocs', { title: 'Search internal docs', description: 'Full-text search over the company docs corpus.', inputSchema: { q: z.string().min(2).describe('query'), limit: z.number().int().min(1).max(20).default(5) } }, async ({ q, limit }) => { const hits = await searchCorpus(q, limit); // your code return { content: [{ type: 'text', text: JSON.stringify(hits) }] }; } ); return server; } const app = express(); app.use(express.json()); app.post('/mcp', async (req, res) => { const server = build(); const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined // stateless: simplest, scales horizontally }); res.on('close', () => { transport.close(); server.close(); }); await server.connect(transport); await transport.handleRequest(req, res, req.body); }); app.listen(3000);
Stateless is the right default. You get a horizontally scalable HTTP service with no sticky-session requirements. Opt into stateful sessions only when you need server-pushed notifications or resumability across reconnects — otherwise the operational simplicity is worth more than the missing features.
Auth & permissioning.
A hosted MCP server with tools that touch the outside world is just an API with a different schema. Treat it like one.
- Authenticate the request. The MCP spec supports OAuth 2.1 with PKCE for hosted servers. For internal services, a bearer token or mTLS is fine. Don't ship a public
/mcpwith no auth. - Authorize per tool, scoped to the user. Resolve the calling identity at the start of
handleRequest; pass it into everytool.executevia closure. The tool's scope is the user's scope, never a service account. - Validate every input with zod. Models invent fields, send strings where numbers go, miss requireds. The schema is your contract.
- Rate limit by user, not by IP. One user with a runaway agent should not exhaust the whole server's budget.
- Allow-list tools per client where useful. If your "company search" MCP is consumed by both internal and customer-facing agents, expose only the safe subset to the latter.
app.post('/mcp', async (req, res) => { const user = await authenticate(req); // throws 401 on failure if (!user) return res.status(401).end(); const server = buildFor(user); // tools close over user.id, user.scope const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined }); res.on('close', () => { transport.close(); server.close(); }); await server.connect(transport); await transport.handleRequest(req, res, req.body); });
Ops: deploying it.
| Concern | What to do |
|---|---|
| Runtime | Node 20+ container, Bun, or Cloudflare Workers (use the Web Standards transport variant). |
| Health checks | Expose GET /healthz separately; /mcp is POST-only. |
| TLS | Terminate at your reverse proxy (Caddy, nginx, your platform). Never run plain HTTP in production. |
| Logs | Log per-tool latency, error rate, and authenticated user ID — not the tool inputs (likely sensitive). |
| Tracing | OpenTelemetry GenAI semantic conventions (gen_ai.*) for compatibility with LLM-observability tools. |
| Versioning | Tool name is part of your public API. Additive changes are safe; renames and removals are breaking. |
| Distribution | Publish stdio servers as npm packages (npx -y your-server). Hosted servers: stable URL + auth docs. |
Pitfalls.
The session machinery is the part most likely to break in production. Start stateless. Add sessions only when you have a feature that genuinely needs them — resumability across reconnects, server-pushed notifications. Most MCPs never do.
Models reason better about three well-named tools than one tool with a 20-field input. Split doEverything into listX, getX, createX. Smaller tools, sharper descriptions, fewer hallucinations.
Tool output is untrusted text that flows back to the model. Sanitize, structure as JSON, never paste raw scraped HTML — an attacker who controls the source controls your model. (See the security guide for the long version.)
The description is how the model decides whether to call your tool. "Returns the company's internal documentation matching q; use this before answering questions about policies, runbooks, or product internals" picks up calls; "search" does not.