May 28, 2026 AI Models

Claude Opus 4.8 Lands With Record Coding Scores, Effort Control, and Dynamic Workflows

Anthropic shipped Claude Opus 4.8 just 42 days after Opus 4.7—the shortest gap between Opus releases yet. It's not a generational leap; Anthropic calls it "a modest but tangible improvement." But it sets a new record for publicly available models on agentic coding, ships at the same price as 4.7, and launches alongside two features that matter more than the model bump: user-facing effort control and Dynamic Workflows in Claude Code.

The Benchmarks

The number that leads is 69.2% on SWE-Bench Pro, up from 64.3% on Opus 4.7—a record for any publicly available model. The rest of the card backs the agentic framing:

OSWorld-Verified (computer use): 83.4% — top of the category.
Online-Mind2Web (browser agents): 84% — surpassing Opus 4.7.
Reasoning with tools: 57.9% — best across the models Anthropic tested.
Terminal-Bench 2.1: 74.6% — strong, though GPT-5.5 still leads this one at 78.2%.

On CursorBench, Opus 4.8 exceeds prior Opus models at every effort level, and Anthropic highlights meaningfully more efficient tool calling—fewer steps for the same result, which is exactly the axis that compounds in long agent runs.

The Honesty Upgrade

The most consequential change isn't on a leaderboard. Anthropic reports Opus 4.8 is roughly 4× less likely than Opus 4.7 to let flaws in its own code pass without flagging them. Early testers describe a model that pushes back when a plan isn't sound and asks clarifying questions before making irreversible changes.

In agentic contexts that reliability compounds. A coding agent that quietly ships a subtle bug at step 7 of a 40-step task poisons everything downstream; a model that stops and says "this looks wrong, are you sure?" is worth more than a couple of benchmark points. Anthropic's own framing on launch day: Opus 4.8 is "more likely to flag uncertainties about its work and less likely to make unsupported claims."

Effort Control Comes to Everyone

Opus 4.8 introduces a user-facing effort control sitting alongside the model selector in claude.ai and Cowork—available on all plans. The levels are default (high), extra, and max. Higher effort means Claude thinks more frequently and more deeply for better results on hard problems; lower effort means faster responses that burn rate limits more slowly.

For developers running long async workflows, Anthropic specifically recommends the "extra" setting—enough thinking budget for sustained multi-step work without paying max-tier latency on every turn. This is the same effort-tier idea that's been creeping into the whole field (OpenAI's reasoning levels, the xhigh tier Opus 4.7 introduced), now exposed as a first-class user control rather than an API-only knob.

Dynamic Workflows: Hundreds of Subagents, One Session

The headline launch-day feature is Dynamic Workflows in Claude Code, available in research preview for Enterprise, Team, and Max plans. It lets Claude plan a large task, then spin up hundreds of parallel subagents in a single session—each planning, executing, and verifying its slice of the work—before Claude checks the combined output against a bar you set and reports back.

The flagship use case is the one that has resisted automation the longest: codebase-scale migrations. Anthropic's example is Claude Code with Opus 4.8 carrying a migration across hundreds of thousands of lines "from kickoff to merge, with the existing test suite as its bar." With Opus 4.8 the subagents can also run longer, which is what makes a full migration—not just a proof of concept—feasible in one session.

This is the natural successor to the orchestration story Anthropic started with Multi-Agent Orchestration at its May Dev Conference, and it leans on the same agent runtime now hardened by self-hosted sandboxes. The pieces are starting to click into a single coherent system: a model that doesn't lie about its own work, an orchestrator that fans out across hundreds of workers, and a sandbox that runs inside your perimeter.

Pricing and Availability

Regular pricing is unchanged from Opus 4.7: $5 per million input tokens, $25 per million output. Fast mode runs at 2.5× the speed for $10/$50 per million—and that fast-mode price is 3× cheaper than fast mode was on previous Opus models, a real cost cut hiding inside a same-price headline.

Opus 4.8 is available everywhere today via claude-opus-4-8 on the Claude API, plus Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The same-day availability across all three hyperscalers is itself a sign of how the Claude Platform on AWS day-one-parity commitment is playing out.

And Mythos Is Coming

One forward-looking note dropped alongside the launch. Anthropic positioned Opus 4.8 as "the best you can build on right now, outside the Mythos-class systems we're still testing under Project Glasswing"—and said it expects Mythos-class models to reach all customers "in the coming weeks," once stronger safeguards are in place. The previously-indefinite restriction now has a near-term timeline.

Why It Matters for Web Developers

The practical read: if you're already on Opus, upgrading is free and strictly better—same price, higher coding scores, more efficient tool calling, and a model that's far less likely to ship you a silent bug. Point your claude-opus-4-8 identifier at it and the cost curve doesn't move while the quality does.

The bigger shift is Dynamic Workflows. Codebase-scale migrations—framework upgrades, API deprecations, a 200K-line TypeScript strict-mode conversion—have always been the work nobody wants to staff because it's enormous, mechanical, and risky. An orchestrator that fans out hundreds of verifying subagents against your own test suite turns that from a quarter-long project into a single session. It's a research preview, so treat it as one—but it's the clearest preview yet of what "the agent did the migration overnight" actually looks like.

Source: anthropic.com