Qwen 3.6 Models

Four models, one family - from Mac M4 16GB to frontier performance at $0.40/M tokens

The Qwen 3.6 family spans proprietary hosted models and open-weight releases. Plus delivers 78.8% SWE-bench with 1M context and preserve_thinking at $0.40/$2.40 per million tokens (12x cheaper than Claude Opus 4.6). Max handles advanced reasoning and multi-modal tasks. The 27B dense model achieves 77.2% SWE-bench and 48.2 SkillsBench (beating Claude 4.5 Opus). The 35B A3B MoE runs on Mac M4 16GB at Q3 quantization with 20-40 tok/s. Every model shares the same architecture foundation and OpenAI-compatible API.

Model family

Choose the right Qwen 3.6 model for your use case and budget

From lightweight local deployment on a laptop to maximum hosted performance with 1M context, the Qwen 3.6 family covers every scale, budget, and deployment scenario.

Proprietary

Hosted models with maximum performance and 1M context

Plus and Max are proprietary models available via API. They offer the highest performance, 1M context windows, up to 65,536 output tokens, and features like preserve_thinking that are exclusive to the hosted tier. DashScope pricing starts at $0.40 per million input tokens - roughly 12x cheaper than Claude Opus 4.6. Batch invocation available at 50% of real-time pricing.

Available via OpenAI-compatible API through DashScope and OpenRouter (free tier available)

Open-weight

Run on your own hardware with full control - Apache 2.0 licensed

The 27B dense and 35B A3B MoE models are released under the Apache 2.0 license. Deploy locally with Ollama, vLLM, llama.cpp, SGLang, or KTransformers. The 27B fits 16GB VRAM with IQ4_XS GGUF and KV cache compression. The 35B A3B runs on Mac M4 16GB at Q3 quantization. Zero per-token costs, full data privacy, and freedom to fine-tune.

Available on HuggingFace, Ollama, and GGUF repositories

Qwen 3.6 Plus

Proprietary

Flagship proprietary model with 1M context, preserve_thinking for agentic workflows, and top-tier performance. 78.8% SWE-bench Verified, 61.6 Terminal-Bench 2.0, 56.6 SWE-bench Pro. DashScope pricing: $0.40 input / $2.40 output per million tokens, roughly 12x cheaper than Claude Opus 4.6. Batch invocation at 50% off. Up to 65,536 output tokens per request.

1M context window, preserve_thinking parameter, 65K output tokens, batch at 50% off

API access via DashScope and OpenRouter (free preview tier available)

Qwen 3.6 Max

Proprietary

High-performance proprietary model optimized for complex reasoning, multi-modal tasks, and document understanding. Strong across math, science, visual analysis, and long-document processing. Extended context window with advanced reasoning capabilities for the most demanding analytical tasks.

Extended context, multi-modal capabilities, advanced reasoning, document understanding

API access via DashScope and OpenRouter

Qwen 3.6 27B

Open-weight

Dense 27B parameter model delivering the best open-weight coding performance. 77.2% SWE-bench Verified, 59.3 Terminal-Bench 2.0, 83.9 LiveCodeBench, 48.2 SkillsBench (beats Claude 4.5 Opus at 45.3), 1487 QwenWebBench, 36.2 NL2Repo, 72.4 Claw-Eval. Can run on 16GB VRAM using IQ4_XS GGUF with KV cache compression supporting 100K context.

55.6GB FP16, 16GB VRAM with IQ4_XS + KV cache compression, dense architecture, Apache 2.0

HuggingFace, Ollama (qwen3.6:27b), GGUF downloads

Qwen 3.6 35B A3B

Open-weight

MoE model with 35B total / 3B active parameters. Near-27B performance in a consumer GPU footprint. 73.4% SWE-bench Verified, 51.5 Terminal-Bench 2.0, 80.4 LiveCodeBench, 68.7 Claw-Eval, 1397 QwenWebBench. Runs on Mac M4 16GB at Q3 quantization (~17GB). 20-40 tok/s on consumer hardware at 4-bit. Vision and multimodal supported.

~21GB Q4_K_M, ~17GB Q3_K_M (Mac M4 16GB), 3B active params, 20-40 tok/s, Apache 2.0

HuggingFace, Ollama (qwen3.6:35b-a3b), GGUF downloads

Qwen ecosystem

A unified model family for every deployment scenario and budget

From cloud API at $0.40/M tokens to Mac M4 laptop deployment, the Qwen 3.6 family provides consistent quality, compatible interfaces, and industry-leading price-performance across all deployment targets.

Qwen 3.6 Plus

78.8% SWE-bench, 1M context, $0.40/M tokens

Try Plus

Qwen 3.6 Max

Advanced reasoning and multi-modal

Try Max

Qwen 3.6 27B

77.2% SWE-bench, beats Claude on SkillsBench

Try 27B

Qwen 3.6 35B A3B

73.4% SWE-bench, Mac M4 16GB friendly

Try 35B

API Reference

OpenAI-compatible endpoints, preserve_thinking

View API

Community

Join the Qwen developer community

Join

Get started

Ready to explore the Qwen 3.6 family? Try free, deploy anywhere

Try any Qwen 3.6 model for free in the browser or via OpenRouter's free tier. Download open-weight models under Apache 2.0 to run on your own hardware. From Mac M4 16GB to production servers, from $0.40/M tokens API to zero-cost local deployment.