When should I build an MCP server vs just add a tool to my AI SDK route?

Build an MCP server when the same capability needs to be reused across multiple clients — your IDE agent, your chat product, a Slack bot, a CLI. Use a plain tool call when the capability is consumed only by one chat endpoint inside one app. MCP buys you portability at the cost of an extra hop; if you don't need the portability, skip the hop.

For stdio servers, the server inherits the user's environment — pass secrets via env vars in the client config and treat the server as trusted local code. For Streamable HTTP servers, the spec supports OAuth 2.1 with PKCE; for internal services you can use simpler bearer tokens or mTLS. Either way, the server should authenticate the request, then run each tool with the calling user's scope — never with a single service account that can do anything.

Stateful or stateless sessions?

Default to stateless — set sessionIdGenerator: undefined. You get a horizontally scalable, load-balancer-friendly HTTP API with no surprises. Opt into stateful sessions only when you genuinely need server-pushed notifications, resumability across reconnects, or per-session state that can't live in your DB. Most internal MCPs are happier stateless.

How do I version a tool without breaking clients?

Treat tool names as part of a public API. Adding optional input fields and adding tools are backwards-compatible. Renaming a tool, removing fields, or changing input types are breaking — ship a new tool alongside the old one and deprecate the old one with a description note. Models are good at picking the newer of two near-duplicate tools when the description tells them to.

Building MCP Servers — TypeScript, Streamable HTTP, Production

Q: stdio or Streamable HTTP?

stdio for local, single-user integrations launched by the client (an IDE spawning a child process). Streamable HTTP for remote, multi-user servers anything else can connect to — that includes hosted MCPs, internal services, and anything you deploy to a container platform. The older HTTP+SSE transport is deprecated as of the 2025-03-26 spec revision; new servers should use Streamable HTTP.

CH 01

When MCP is the right shape.

MCP is plumbing for "give an LLM access to this capability." So is a tool call inside an AI SDK route. So is a CLI the agent shells out to. They are not interchangeable.

Choose	When	Example
MCP server	The capability is consumed by multiple clients (your IDE agent, your chat product, a Slack bot).	Internal "company search" surfaced to Cursor, Claude Code, and a customer support agent.
Tool call in a route	The capability is consumed only by one chat endpoint inside one app.	`listMyOrders` on your AI SDK chat route.
CLI / shell	You already have a CLI; the agent can just run it.	`gh pr view`, `terraform plan`, `psql -c`.
HTTP API the agent calls directly	The agent has internet access and you have a public, well-documented API.	Public REST APIs the agent already knows how to call.

The MCP test is "would I expose this to more than one client?" If yes, build the server. If no, save yourself a deployment and inline it as a tool call.

CH 02

Transport: stdio or Streamable HTTP.

Concern	stdio	Streamable HTTP
Where it runs	Local, spawned by the client	Hosted, anything connects
Users	Single user (the one running the client)	Multi-user
Auth	Inherits OS env	OAuth 2.1 / bearer / mTLS
Discovery	JSON config in the client	URL
Distribution	npm package + a config snippet	Public URL + auth instructions
Use when	Local file system, dev databases, IDE-only tools	Internal services, hosted SaaS MCPs, anything multi-tenant

"HTTP + SSE" is no longer the answer

The older HTTP+SSE transport was deprecated in the spec's 2025-03-26 revision. It still works for backward compatibility, but new servers should use Streamable HTTP — single endpoint, works behind load balancers, simpler operationally. If a tutorial wires new SSEServerTransport(...), it is out of date.

CH 03

A stdio server in 40 lines.

The local case. A Node.js process the client launches; you teach it a tool, you're done.

package.json (excerpt)

{
  "name": "acme-time-mcp",
  "type": "module",
  "bin": { "acme-time-mcp": "./dist/index.js" },
  "dependencies": {
    "@modelcontextprotocol/sdk": "^1.0.0",
    "zod": "^3.23.0"
  }
}

src/index.ts — stdio MCP

#!/usr/bin/env node
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';

const server = new McpServer({ name: 'acme-time', version: '1.0.0' });

server.registerTool(
  'currentTime',
  {
    title: 'Current time',
    description: 'Returns the current time in a given IANA timezone.',
    inputSchema: { zone: z.string().describe('e.g. America/New_York') }
  },
  async ({ zone }) => {
    const now = new Date().toLocaleString('en-US', { timeZone: zone });
    return { content: [{ type: 'text', text: now }] };
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

Client config (Cursor / Claude Desktop)

{
  "mcpServers": {
    "acme-time": {
      "command": "npx",
      "args": ["-y", "acme-time-mcp"]
    }
  }
}

That's the whole loop: register a tool with a zod input schema, hand the server a stdio transport, publish to npm, paste a four-line config into the client. Three to four hours from idea to "the agent can call this in your IDE."

CH 04

A Streamable HTTP server.

The hosted case. Mount the transport on a single /mcp endpoint, deploy it like any other Node service.

src/server.ts — stateless Streamable HTTP

import express from 'express';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { z } from 'zod';

function build() {
  const server = new McpServer({ name: 'acme-search', version: '1.0.0' });

  server.registerTool(
    'searchDocs',
    {
      title: 'Search internal docs',
      description: 'Full-text search over the company docs corpus.',
      inputSchema: {
        q:     z.string().min(2).describe('query'),
        limit: z.number().int().min(1).max(20).default(5)
      }
    },
    async ({ q, limit }) => {
      const hits = await searchCorpus(q, limit); // your code
      return { content: [{ type: 'text', text: JSON.stringify(hits) }] };
    }
  );
  return server;
}

const app = express();
app.use(express.json());

app.post('/mcp', async (req, res) => {
  const server = build();
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: undefined // stateless: simplest, scales horizontally
  });
  res.on('close', () => { transport.close(); server.close(); });
  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});

app.listen(3000);

Stateless is the right default. You get a horizontally scalable HTTP service with no sticky-session requirements. Opt into stateful sessions only when you need server-pushed notifications or resumability across reconnects — otherwise the operational simplicity is worth more than the missing features.

CH 05

Auth & permissioning.

A hosted MCP server with tools that touch the outside world is just an API with a different schema. Treat it like one.

Authenticate the request. The MCP spec supports OAuth 2.1 with PKCE for hosted servers. For internal services, a bearer token or mTLS is fine. Don't ship a public /mcp with no auth.
Authorize per tool, scoped to the user. Resolve the calling identity at the start of handleRequest; pass it into every tool.execute via closure. The tool's scope is the user's scope, never a service account.
Validate every input with zod. Models invent fields, send strings where numbers go, miss requireds. The schema is your contract.
Rate limit by user, not by IP. One user with a runaway agent should not exhaust the whole server's budget.
Allow-list tools per client where useful. If your "company search" MCP is consumed by both internal and customer-facing agents, expose only the safe subset to the latter.

Auth shape (sketch)

app.post('/mcp', async (req, res) => {
  const user = await authenticate(req); // throws 401 on failure
  if (!user) return res.status(401).end();

  const server = buildFor(user); // tools close over user.id, user.scope
  const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
  res.on('close', () => { transport.close(); server.close(); });
  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});

CH 06

Ops: deploying it.

Concern	What to do
Runtime	Node 20+ container, Bun, or Cloudflare Workers (use the Web Standards transport variant).
Health checks	Expose `GET /healthz` separately; `/mcp` is POST-only.
TLS	Terminate at your reverse proxy (Caddy, nginx, your platform). Never run plain HTTP in production.
Logs	Log per-tool latency, error rate, and authenticated user ID — not the tool inputs (likely sensitive).
Tracing	OpenTelemetry GenAI semantic conventions (`gen_ai.*`) for compatibility with LLM-observability tools.
Versioning	Tool name is part of your public API. Additive changes are safe; renames and removals are breaking.
Distribution	Publish stdio servers as npm packages (`npx -y your-server`). Hosted servers: stable URL + auth docs.

PITFALLS

Pitfalls.

Stateful by default

The session machinery is the part most likely to break in production. Start stateless. Add sessions only when you have a feature that genuinely needs them — resumability across reconnects, server-pushed notifications. Most MCPs never do.

One giant tool that takes a JSON blob

Models reason better about three well-named tools than one tool with a 20-field input. Split doEverything into listX, getX, createX. Smaller tools, sharper descriptions, fewer hallucinations.

Putting raw tool output in the model's context

Tool output is untrusted text that flows back to the model. Sanitize, structure as JSON, never paste raw scraped HTML — an attacker who controls the source controls your model. (See the security guide for the long version.)

No description, or a one-word description

The description is how the model decides whether to call your tool. "Returns the company's internal documentation matching q; use this before answering questions about policies, runbooks, or product internals" picks up calls; "search" does not.