May 18, 2026 AI Models

Cursor Ships Composer 2.5 and Begins Training a 10x Larger Model on Colossus 2

Cursor released Composer 2.5, the fourth Composer model in seven months, and made it the new default in the Cursor model picker. The headline number is a jump on Terminal-Bench 2.0 from 61.7 to 69.3 and on CursorBench v3.1 from 52.2 to 63.2—gains Cursor attributes to 25× more synthetic training tasks, tougher RL environments, and new learning methods on top of the same Moonshot Kimi K2.5 base. In the same post, Cursor announced it is now training a much larger model from scratch with SpaceXAI on Colossus 2.

What Actually Changed

Composer 2.5 is a post-training story, not a base-model swap. Cursor kept Kimi K2.5 underneath and pushed the gains through scaled RL and a much larger bank of synthetic coding environments. The pitch from the team: better sustained work on long-running tasks, more reliable execution of complex instructions, and a model that's "more pleasant to collaborate with"—a category Cursor explicitly calls out as not well-captured by benchmarks but heavily weighted in their own evals.

Against the frontier, Composer 2.5 still trails Opus 4.7 and GPT-5.5 on most coding benchmarks, but it edges past GPT-5.5 by about 2 points on SWE-Bench Multilingual (79.8%) and closes the CursorBench gap meaningfully. The point isn't to win benchmarks outright—it's to be the model you reach for when latency and cost matter, which on most real workdays is most of the time.

Pricing That Reframes the Math

Standard tier sits at $0.50/M input and $2.50/M output tokens. A faster variant with the same intelligence runs $3.00/M input and $15.00/M output and is the default in the picker. For reference, Opus 4.7 charges $25/M output and GPT-5.5 charges $30/M output—Composer 2.5 standard is roughly a tenth of either, and the fast tier still undercuts the fast tiers of both frontier models. Composer 2.5 ships with double included usage for the first week.

The trade-off Cursor is making is explicit: pick the model that's specifically tuned for the agent loop you're already in, pay an order of magnitude less per token, and the savings fund longer sessions, more parallel agents, and the iteration loops that actually move work forward.

Colossus 2 and the 10× Model

The second half of the announcement is the more consequential one. Cursor confirmed it is working with SpaceXAI—the AI division of SpaceX after the April acquisition disclosure—to train a "significantly larger model from scratch, using 10× more total compute," tapping the Colossus 2 supercomputer's millions of H100-equivalent GPUs. Cursor frames this as a major leap rather than another tuning pass.

This puts Cursor in a different league as a model lab. Composer 1 was a tuning project on someone else's base; Composer 2.5 is an aggressive post-training effort; the next model is a from-scratch pretraining run on the largest GPU cluster anyone has talked about publicly. The implication for the model market: Cursor isn't trying to be a thin layer on top of Anthropic and OpenAI anymore—it's competing directly for frontier coding capability on its own training stack.

Why It Matters for Web Developers

For day-to-day usage in Cursor, the calculus is now hard to argue with: Composer 2.5 is fast enough, smart enough, and cheap enough to be the default for the bulk of agent work, with Opus 4.7 or GPT-5.5 reserved for the harder cases. That's roughly how the model picker is already configured.

The longer-horizon read is that the gap between IDE vendors and model labs is collapsing. When the IDE you live in also ships the coding model—and trains a new one on Colossus 2—the question of which model to use stops being separate from the question of which editor to use. That's a structural shift, and Composer 2.5 is the version of it that's available to ship against today.

Source: cursor.com