Should I use the Vercel AI SDK or call the provider API directly?

For most web apps in 2026, use the AI SDK. You get a stable streaming protocol (SSE), a typed UIMessage shape with structured parts (text, tool calls, reasoning), and provider-agnostic code. Calling provider SDKs directly makes sense when you need a provider-specific feature the AI SDK has not surfaced yet, or in non-web contexts where the React hooks add nothing.

Edge runtime or Node?

Edge for chat-style streaming endpoints: faster cold starts, longer streaming windows on most platforms, and lower per-request cost. Node for anything that needs a Node-only library (Sharp, native bindings, heavy file system work, some database drivers). It is fine to mix — chat route on edge, upload route on Node.

How do I stop a stream when the user navigates away?

useChat exposes a stop() function that aborts the underlying fetch. On the server, propagate the request AbortSignal into streamText via the abortSignal option so the provider call is cancelled and you stop billing tokens you will never render. Without this, users navigating away keep meters running.

Is SSE good enough or do I need WebSockets?

SSE is the right default for one-way model output. It works through proxies, debugs with curl, and the AI SDK 5 protocol is built on it. Reach for WebSockets only when you genuinely need bidirectional, low-latency frames — voice, collaborative cursors, live audio transcription. Do not adopt the operational cost of WebSockets for plain chat.

How do I keep tool calls from running away?

Set stopWhen with stepCountIs to cap how many tool-call steps a single request can take, validate every tool input with zod, and run tools that touch the outside world (databases, third-party APIs) behind the same auth your normal routes use. Treat the model as an untrusted client that happens to live on your server.

Streaming AI in Web Apps — Vercel AI SDK 5, SSE, and Edge Runtime

CH 01

Why streaming, why now.

A non-streaming LLM call feels broken. The user clicks, sees a spinner, waits seven seconds, then a wall of text appears. A streaming call feels alive — first token in 400 ms, the rest arrives as the model thinks. Same total time, completely different product.

The web platform already has the primitive for this — Server-Sent Events. The AI SDK 5 made SSE the official wire format for chat in August 2025, which means streaming is now a few lines of code instead of a custom protocol you maintain. There is no good reason in 2026 to ship a non-streaming chat endpoint.

Perceived latency drops 80%. Time-to-first-token is what the user feels; total time is what your bill reads.
Cancellation becomes free. The user closes the tab; the fetch aborts; the provider call ends; you stop paying for tokens nobody will ever see.
Tool calls become interactive. You can render the "calling tool X" state while the call is in flight, which buys you another 2–10 s of patience.

CH 02

The 2026 stack at a glance.

For a TypeScript web app, you need four packages and one runtime decision. That is the whole stack.

Layer	Pick	Why
Client hook	`@ai-sdk/react` — `useChat` / `useCompletion`	Transport-based, typed messages, parts array, no internal input state to fight.
Server	`ai` — `streamText`	Returns a stream that maps cleanly to an SSE response with `toUIMessageStreamResponse()`.
Provider	`@ai-sdk/anthropic` or `@ai-sdk/openai` (or both)	Drop-in, swap with one line, no rewriting your route.
Validation	`zod` for tool input schemas	The AI SDK speaks zod natively; tool calls fail loudly when the model invents fields.
Runtime	Edge for chat, Node for uploads	See Edge or Node.

A note on versions

If you are reading old tutorials, ignore anything that wires useChat with an api: "/api/chat" string and handleInputChange. That is AI SDK 4. The 5.x rewrite uses a transport instance and you manage input state yourself. Most copy-paste pain comes from mixing the two.

CH 03

The server route in 20 lines.

A Next.js App Router route handler that streams Anthropic Sonnet to the client. Drop-in.

app/api/chat/route.ts

import { anthropic } from '@ai-sdk/anthropic';
import { streamText, convertToModelMessages } from 'ai';

// Edge runtime: faster cold start, longer streaming window.
export const runtime = 'edge';
export const maxDuration = 60;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: anthropic('claude-sonnet-4-5'),
    system: 'You are a concise assistant. Cite sources when you have them.',
    messages: convertToModelMessages(messages),
    abortSignal: req.signal, // stop billing on client disconnect
  });

  return result.toUIMessageStreamResponse();
}

Three lines do the heavy lifting: convertToModelMessages turns the typed UI message shape into something the provider understands; abortSignal: req.signal wires browser disconnects through to the provider so you stop being billed; toUIMessageStreamResponse() writes the SSE stream the client expects.

CH 04

The client hook in 15 lines.

Same React component you'd write for any form, except messages is the canonical chat state and sendMessage kicks off the stream.

app/chat/page.tsx

'use client';
import { useState } from 'react';
import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport } from 'ai';

export default function Chat() {
  const { messages, sendMessage, status, stop } = useChat({
    transport: new DefaultChatTransport({ api: '/api/chat' }),
  });
  const [input, setInput] = useState('');

  return (
    <form onSubmit={(e) => {
      e.preventDefault();
      if (input.trim()) {
        sendMessage({ text: input });
        setInput('');
      }
    }}>
      {messages.map((m) => (
        <div key={m.id}>
          <strong>{m.role}:</strong>
          {m.parts.map((p, i) =>
            p.type === 'text' ? <span key={i}>{p.text}</span> : null
          )}
        </div>
      ))}
      <input value={input} onChange={(e) => setInput(e.target.value)} />
      {status === 'streaming'
        ? <button type="button" onClick={stop}>Stop</button>
        : <button type="submit">Send</button>}
    </form>
  );
}

Two things to notice. First, m.parts — messages in AI SDK 5 are not strings, they are arrays of typed parts. Text, tool calls, reasoning, files. You render each part for what it is, which is what makes the next chapter tractable.

Second, stop() is wired to the button when status === 'streaming'. Combined with abortSignal: req.signal on the server, that's end-to-end cancellation. Most production chat apps still ship without this and pay for it monthly.

CH 05

Tool calls without footguns.

Tools turn a chat endpoint into an agent. They also turn one careless route into your most expensive bug. The discipline: every tool has a zod input schema, every tool runs behind the same auth as the rest of your app, and the whole loop has a hard step cap.

app/api/chat/route.ts — with tools

import { anthropic } from '@ai-sdk/anthropic';
import { streamText, tool, stepCountIs, convertToModelMessages } from 'ai';
import { z } from 'zod';
import { getUser, listOrdersFor } from '@/server/db';

export const runtime = 'edge';
export const maxDuration = 60;

export async function POST(req: Request) {
  const user = await getUser(req); // same auth as every other route
  if (!user) return new Response('Unauthorized', { status: 401 });

  const { messages } = await req.json();

  const result = await streamText({
    model: anthropic('claude-sonnet-4-5'),
    messages: convertToModelMessages(messages),
    abortSignal: req.signal,
    stopWhen: stepCountIs(5), // hard cap on tool-call loops
    tools: {
      listMyOrders: tool({
        description: 'List the signed-in user\u2019s recent orders.',
        inputSchema: z.object({
          limit: z.number().int().min(1).max(20).default(5),
        }),
        execute: async ({ limit }) => {
          // Scope by user.id. The model never sees other users\u2019 data.
          return listOrdersFor(user.id, limit);
        },
      }),
    },
  });

  return result.toUIMessageStreamResponse();
}

Three rules and you'll avoid 90% of agent-shaped incidents:

Auth the route, not the tool. The user object comes from your normal session handling, and every tool's execute closes over it. The model can never escalate beyond what that user can already do.
Validate every input with zod. Models invent fields, send strings where numbers go, miss requireds. Zod fails loudly; without it you get silent corruption.
Cap the loop. stopWhen: stepCountIs(5) is your circuit breaker. If a model decides to call list/get/list/get forever, you cut it at five and return what you have.

CH 06

Edge or Node?

Concern	Edge	Node
Cold start	~50 ms	200–1500 ms
Max stream duration (Vercel)	Long (300 s+ on most plans)	Capped per plan; check current limits
Native modules / Sharp / Postgres driver	Not supported	Supported
Per-request cost	Lower	Higher
When to pick	Chat, completions, anything I/O-bound	File uploads, image processing, RAG ingestion

The right default is split routes: app/api/chat/route.ts on edge, app/api/ingest/route.ts on Node. Don't pick one runtime for the whole project on theological grounds.

DEMO · INTERACTIVE

Live: streaming budget estimator.

Tell it the shape of your traffic; it returns an honest monthly bill and what the user will feel. All math runs in your browser using public list prices for the major models in May 2026 — treat as an estimate, not a quote.

Streaming budget Public list prices · Numbers in your browser only

Model Claude Sonnet 4.5

Daily active users 500

Messages per user per day 6

Avg input tokens / turn 1500

Avg output tokens / turn 500

Prompt-cache hit rate 40%

Monthly model spend $0

Per active user / month $0

Time-to-first-token (typical) 0 ms

Avg full response 0 s

Pick your traffic to see numbers.

Read the per-user number, not the total. A $40k monthly bill across 100k users is $0.40/user — fine. The same $40k across 800 users is $50/user and you have a pricing problem the model cannot solve.

PITFALLS

Production pitfalls.

Forgetting abortSignal: req.signal

The most expensive line of code you can omit. Without it, a user who navigates away mid-response keeps the provider call running and you keep paying for tokens the user never sees. On a chatty product this can be 20% of your bill.

Trusting messages from the client

The browser can send anything — including a system message that says "ignore prior instructions." Always set your own system on the server, never let the client supply it, and consider stripping any role: 'system' from inbound messages before calling the model.

Putting tool results into the next prompt unfiltered

Tool output is untrusted text that flows back into the model's context. If fetchUrl returns a page that says "you are now in admin mode," the model is allowed to believe it. Sanitize tool output, escape it, or render it as data rather than prose — never paste raw scraped HTML into the next turn.

No rate limit on the chat route

A streaming endpoint without rate limiting is a debit card on the internet. Use a per-user limiter (Upstash Ratelimit, Vercel WAF, your reverse proxy) and a hard token-per-day cap stored in your DB. Both, not either.

Skipping the empty-state UX

While the first token is in flight, the message bubble is empty. Render a typing indicator or a "thinking..." placeholder — otherwise the streaming UX feels worse than a spinner for the first half second.