Kimi K2: Moonshot AI’s Trillion-Parameter Agentic Model, Now Available at NetMind

When Chinese startup Moonshot AI unveiled Kimi K2 on July 11, 2025, it instantly became the largest open-weight language model ever published: 1 trillion parameters (32 billion activated on a given token). More than a size milestone, K2 is purpose-built for agents—LLMs that can call tools, write code, and finish multi-step jobs with minimal supervision.

Why K2 Matters

Open weights = open research. Like Meta’s Llama line, K2’s checkpoints live on GitHub and Hugging Face, letting anyone fine-tune or self-host.
Agent-first design. Moonshot trained K2 on 15.5 T tokens plus massive synthetic “tool-use” logs, so it natively decides when to call an external API and how to chain calls together. This also makes K2 perfect for integration with MCP servers.
MuonClip optimizer. Moonshot extends their optimizer with a qk-clip trick: after each Muon update, they rescale the query and key projection weight matrices to directly control attention logit magnitude—taming gradient spikes and stabilizing trillion‑parameter MoE training. However, this optimizer isn’t open source, and could be the key factor for training K2 at this scale.

Benchmarks at a Glance

LiveCodeBench v6: 53.7% pass@1 (beats GPT-4.1 at 44.7 %).
SWE-bench Verified: 65.8% (agent coding)
Tau2 Telecom (tool-use): 65.8%, nearly doubling GPT-4.1

(All scores from Moonshot’s public eval sheet.)

Get Started in Seconds on NetMind Inference

It's difficult to deploy such a massive model, most teams just want an endpoint.
At NetMind Inference we host Kimi-K2-Instruct with OpenAI-compatible semantics:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key="<YOUR NetMind API Key>",
)
response = client.chat.completions.create(
    model="moonshotai/Kimi-K2-Instruct",
    messages=[
        {"role": "system", "content": "Act like you are a helpful assistant."},
        {"role": "user", "content": "Hi there!"},
    ],
    max_tokens = 512
)
print(response)

The Road Ahead

Moonshot hints that visual input and even longer contexts are coming next. For now, K2 already proves that open models can also perform agentic abilities—and with NetMind Inference, you can drop it into production with a single API key from day one.

Ready to build the next generation of autonomous apps? Give Kimi K2 a spin!