When Chinese startup Moonshot AI unveiled Kimi K2 on July 11, 2025, it instantly became the largest open-weight language model ever published: 1 trillion parameters (32 billion activated on a given token). More than a size milestone, K2 is purpose-built for agents—LLMs that can call tools, write code, and finish multi-step jobs with minimal supervision.
(All scores from Moonshot’s public eval sheet.)
It's difficult to deploy such a massive model, most teams just want an endpoint.
At NetMind Inference we host Kimi-K2-Instruct with OpenAI-compatible semantics:
from openai import OpenAI
client = OpenAI(
base_url="https://api.netmind.ai/inference-api/openai/v1",
api_key="<YOUR NetMind API Key>",
)
response = client.chat.completions.create(
model="moonshotai/Kimi-K2-Instruct",
messages=[
{"role": "system", "content": "Act like you are a helpful assistant."},
{"role": "user", "content": "Hi there!"},
],
max_tokens = 512
)
print(response)
Moonshot hints that visual input and even longer contexts are coming next. For now, K2 already proves that open models can also perform agentic abilities—and with NetMind Inference, you can drop it into production with a single API key from day one.
Ready to build the next generation of autonomous apps? Give Kimi K2 a spin!