New LLMs Available: All Qwen3's Best Models
Three new Qwen3 models just landed on NetMind API. From long-context thinking models to lean coders, we’ve got them all plugged into our Model Library and ready to run.
Whether you’re building a LLM-powered assistant, a cost-efficient code interpreter, or just need to prototype with SOTA-level LLM models—there’s something here for you.
Model at a glance
Qwen3-235B-A22B-Thinking-2507
This is your go-to model for deep reasoning over long context windows. It’s the most capable general-purpose model in this drop.
- Total parameters: 235B
- Active parameters: 22B (Mixture-of-Experts)
- Native context window: 262K tokens
- Function calling: Supported
- Pricing: $0.20 (input) / $0.80 (output) per million tokens
- RPM: 60
- Feature: This model supports only thinking mode. The default chat template automatically includes a token of <think> to enforce model thinking. Therefore, it is normal for the model's output to contain only </think> without an explicit opening <think> tag.
Qwen3-235B-A22B-Instruct-2507
Fine-tuned for instruction following and conversational reliability. If you want chat performance without compromising capability, this is it.
- Total parameters: 235B
- Active parameters: 22B (MoE)
- Native context window: 256K tokens
- Function calling: Supported
- Pricing: $0.20 (input) / $0.80 (output) per million tokens
- RPM: 60
- Feature: This model supports only non-thinking mode and does not generate <think></think> blocks in its output. Meanwhile, specifying enable_thinking=False is no longer required.
Qwen3-Coder-30B-A3B-Instruct
A smaller, sharper coding model designed for speed and affordability. Perfect for in-IDE agents, repo summarizers, or lightweight RAG pipelines.
- Total parameters: 30B
- Active parameters: 3B (MoE)
- Native context window: 128K tokens
- Function calling: Supported
- Pricing: $0.04 (input) / $0.17 (output) per million tokens
- RPM: 60
- Feature: Qwen3-Coder series excel in coding tasks. This is a light-weight alternative to the 480B-parameter Qwen3-Coder.
All three models support function calling out of the box and run on NetMind’s ultra-efficient low-latency infra, so you can deploy with confidence—whether you're automating workflows, building tools, or just experimenting.
If you build something with these, we want to see it.
User-Agent: