March 19, 2026 AI Infrastructure

Cloudflare Workers AI Enters the Large Model Game, Starting With Kimi K2.5

Cloudflare has historically focused Workers AI on smaller models, but that era is over. The platform now serves frontier-scale open-source models, starting with Moonshot AI's Kimi K2.5 — a model with a full 256k context window, multi-turn tool calling, vision inputs, and structured outputs. The goal: run the entire agent lifecycle on a single unified platform.

The numbers backing this move are hard to ignore. Cloudflare tested Kimi K2.5 on an internal security review agent that processes over 7 billion tokens per day. Running the same workload on a mid-tier proprietary model would have cost roughly $2.4M annually. With Kimi K2.5 on Workers AI, they cut costs by 77% — and the model caught more than 15 confirmed security issues in a single codebase.

The Model

Kimi K2.5 is a frontier-scale open-source model built for agentic workloads. It supports reasoning, multi-turn tool calling, vision inputs, and structured outputs across a 256k token context window. Pricing sits at $0.60 per million input tokens, $0.10 per million cached input tokens, and $3.00 per million output tokens — significantly below proprietary alternatives at comparable quality.

Platform Improvements for Agents

Alongside the model launch, Cloudflare shipped several infrastructure features designed for agentic workloads:

Prefix caching with discounted pricing — cached input tensors from previous requests skip the prefill stage, reducing Time to First Token and improving throughput. Cached tokens are now priced lower than standard input tokens.
Session affinity headers — a new x-session-affinity header routes requests to the same model instance, dramatically improving cache hit rates across multi-turn conversations.
Revamped async API — a pull-based system replaces the old push-based approach, processing queued requests as GPU capacity becomes available. No more Out of Capacity errors for durable workloads like code scanning and research agents.

Why It Matters

With every developer potentially running multiple agents processing hundreds of thousands of tokens per hour, cost is now the primary blocker to scaling — not capability. Cloudflare is betting that the shift from proprietary to open-source frontier models is inevitable, and that the platform that makes serverless inference of large models effortless will capture that transition. Workers AI already has the surrounding primitives: Durable Objects for state, Workflows for long-running tasks, the Agents SDK for orchestration. Kimi K2.5 fills the last gap — the model itself.

Source: blog.cloudflare.com