On June 12, Moonshot AI released Kimi K2.7-Code, the first model in the Kimi K2 family with "Code" in the name rather than as a console flag. It is a 1-trillion-parameter mixture-of-experts model with 32 billion active parameters, tuned for long-horizon software engineering: planning, editing, running tools, and debugging across many steps. Weights ship on Hugging Face under a Modified MIT license. The Moonshot API exposes it at kimi-k2.7-code through OpenAI-compatible endpoints.
What Changed From K2.6
K2.7-Code keeps the same broad architecture as K2.5 and K2.6: 1T total / 32B active MoE, 256K context window, thinking-only inference. The gains Moonshot reports are efficiency and coding specialization rather than a new base:
- ~30% fewer reasoning tokens on average compared to K2.6, which directly lowers inference cost on agentic loops that burn tokens in chain-of-thought.
- +21.8% on Kimi Code Bench v2, Moonshot's internal coding benchmark, versus K2.6.
- Stronger tool-use scores on company-run evals including MCPMark Verified, where Moonshot reports K2.7-Code ahead of Claude Opus 4.8.
- Better instruction compliance in long contexts, with fewer "overthinking" detours before acting.
Independent verification is still pending. Moonshot's headline benchmarks are vendor-run. Teams evaluating routing should A/B against their own repos before swapping production defaults.
Pricing and Access
The API price card is aggressive relative to closed frontier models:
- Input: $0.95 per million tokens
- Output: $4.00 per million tokens
- Compare: Claude Opus 4.8 at $5.00 / $25.00 per million; Claude Fable 5 at $10.00 / $50.00
Three access paths ship day one:
- Moonshot API with OpenAI- or Anthropic-SDK compatibility (swap the base URL, set
model: "kimi-k2.7-code"). - Kimi Code, Moonshot's terminal and IDE coding agent, which routes tasks through the K2 family natively.
- Hugging Face weights at
moonshotai/Kimi-K2.7-Codefor self-hosting via vLLM or SGLang.
On June 15, Moonshot also rolled out K2.7-Code HighSpeed (kimi-k2.7-code-highspeed), a faster variant reporting up to 6× throughput on coding tasks with median-length inputs and ~260 tok/s on shorter contexts.
Integration Constraints
K2.7-Code runs exclusively in thinking mode. Moonshot fixed sampling parameters at temperature 1.0 and top_p 0.95. Overriding them returns a request error, so you cannot tune determinism the way you might with other models. Plan for that if your CI pipeline expects low-temperature outputs.
For teams already on K2.6 in a gateway or agent harness, the migration path is a one-line model string change. Same context window, same license family, same endpoint shape. The practical test is whether the 30% token reduction holds on your workloads and whether quality regresses on your hardest tasks.
Where It Sits in the Stack
K2.7-Code lands in a crowded week for coding models. Claude Fable 5 pushed Mythos-class capability to general subscribers. Cursor Composer 2.5 trains on Kimi K2.5 bases for IDE-native agents. K2.7-Code is Moonshot's bet that the open-weight lane wins on unit economics: most of the frontier's agentic coding quality at roughly a twelfth of Opus token cost, with weights you can actually host.
The Modified MIT license permits commercial use with attribution requirements at scale, which matters for teams building products on top of the weights rather than just calling the API.
Why It Matters for Web Developers
If you run agent loops in CI, power a coding agent with multiple model tiers, or self-host inference to keep code on your network, K2.7-Code is the most practical open-weight upgrade path this month. The low-risk evaluation: point an existing OpenAI-compatible client at kimi-k2.7-code, rerun your last five failed agent tasks from K2.6, and compare token spend plus diff quality.
For subscription users of Kimi Code, the model is already the default coding SKU. For everyone else, Hugging Face plus vLLM is the self-host path if API egress or data residency rules out Moonshot's cloud. Either way, the K2 family's fifth major release in under a year signals Moonshot is keeping a distinct coding line, not just a general model with a coding preset.