Qwen 3.6 35B A3B

35 billion parameters, 3 billion active - frontier MoE on consumer hardware

Qwen 3.6 35B A3B is a Mixture-of-Experts model that activates only 3B parameters per token from 256 experts. With 73.4% on SWE-bench Verified, 92.7% on AIME 2026, and Apache 2.0 licensing, it brings frontier-class coding and reasoning to consumer GPUs.

Start Chatting View benchmarks

Model variants

Open-weight MoE for local and cloud deployment

Qwen 3.6 35B A3B delivers strong performance with minimal active parameters. Choose the instruction-tuned variant for chat and coding, or the base model for fine-tuning.

Mixture-of-Experts Architecture

35B total parameters, 3B active per token, 256 experts

Qwen 3.6 35B A3B uses a Hybrid Gated DeltaNet + Gated Attention + MoE design with 256 experts, routing 8 experts plus 1 shared expert per token. The 262K native context is extensible to 1M tokens, and the Apache 2.0 license enables unrestricted commercial use.

With only 3B active parameters per token, this model runs efficiently on consumer GPUs while delivering performance that rivals much larger dense models.

Start Chatting See capabilities

Instruction-tuned

35B A3B Instruct

Optimized for conversational AI, coding, and agentic tasks on consumer hardware

Fine-tuned for instruction following and multi-turn dialogue with MoE efficiency

Available now - Apache 2.0

Start Chatting Download weights

Pre-trained

35B A3B Base

Foundation MoE model for fine-tuning and specialized applications

Pre-trained with 256-expert MoE routing on diverse data

Available now - Apache 2.0

View on HuggingFace Fine-tuning guide

Capabilities

256 experts, 3B active - maximum efficiency meets strong performance

Qwen 3.6 35B A3B combines a massive expert pool with minimal active compute to deliver impressive coding, reasoning, and agentic capabilities on consumer-grade hardware.

Real-world software engineering

73.4% on SWE-bench Verified - resolving real GitHub issues with only 3B active parameters per token. Competitive with models that use 10x more compute at inference time.

Terminal operations

51.5 on Terminal-Bench 2.0 for complex multi-step terminal workflows. Handles debugging, system administration, and build pipeline tasks with strong proficiency.

Advanced mathematics

92.7% on AIME 2026 - near-frontier math reasoning from a model that runs on consumer GPUs. Step-by-step thinking mode enables transparent problem solving.

262K to 1M context

262K native context window extensible to 1M tokens. Analyze entire codebases, long documents, and complex multi-turn conversations without truncation.

Competitive coding

80.4 on LiveCodeBench v6 for algorithmic problem solving. Strong code generation, debugging, and refactoring capabilities across multiple programming languages.

Open-weight freedom

Apache 2.0 license enables unrestricted commercial use, fine-tuning, and redistribution. Full transparency into model weights for research and customization.

Key highlights

Frontier MoE performance on consumer hardware

Qwen 3.6 35B A3B achieves strong results across coding, reasoning, and agentic benchmarks while activating only 3B parameters per token.

Top achievements

SWE-bench Verified: 73.4% - real-world software engineering
Terminal-Bench 2.0: 51.5 - complex terminal operations
AIME 2026: 92.7% - advanced mathematics
LiveCodeBench v6: 80.4 - competitive coding
Apache 2.0 license - fully open-weight

Technical specs

35B total parameters, 3B active per token
256 experts: 8 routed + 1 shared active per token
Hybrid Gated DeltaNet + Gated Attention + MoE architecture
262K native context, extensible to 1M tokens
Runs locally on consumer GPUs

Start Free Chat Download weights

Performance

Strong MoE performance at 3B active inference cost

Qwen 3.6 35B A3B scores 73.4% on SWE-bench Verified and 92.7% on AIME 2026 while activating only 3B parameters per token - bringing frontier-class capabilities to consumer hardware.

Qwen 3.6 35B A3B demonstrates that sparse MoE architectures with 256 experts can deliver impressive results across software engineering, mathematics, and competitive coding at a fraction of the compute cost.

Start Chatting View model card

Qwen 3.6 35B A3B performance comparison chart across coding and reasoning benchmarks

SWE-bench Verified: 73.4% with only 3B active parameters

Terminal-Bench 2.0: 51.5 for terminal operations

AIME 2026: 92.7% on advanced mathematics

LiveCodeBench v6: 80.4 competitive coding

Apache 2.0 open-weight license

Benchmark comparison

Qwen 3.6 35B A3B vs the Qwen 3.6 family and competitors

Qwen 3.6 35B A3B delivers strong performance across software engineering, terminal operations, and reasoning benchmarks at minimal inference cost.

Benchmark	Qwen 3.6 35B A3B MoE Featured	Qwen 3.6 27B Dense	Qwen 3.6 Plus Proprietary	Qwen 3 235B A22B MoE
SWE-bench Verified Real-world software engineering	73.4%	77.2%	78.8%	76.2%
Terminal-Bench 2.0 Terminal operations	51.5	59.3	61.6	-
AIME 2026 Mathematics No tools	92.7%	94.1%	-	-
LiveCodeBench v6 Competitive coding	80.4	83.9	-	-

Benchmark results from official Qwen 3.6 model card and HuggingFace evaluations.

256-Expert MoE

35B capacity, 3B inference cost - runs on consumer GPUs

The Mixture-of-Experts design routes each token through 8 of 256 experts plus 1 shared expert. All 35B parameters load for routing diversity, but only 3B activate per forward pass. Combined with the Hybrid Gated DeltaNet + Gated Attention architecture, this enables consumer-GPU deployment with strong performance.

3B active parameters per token from 35B total capacity
256 experts: 8 routed + 1 shared active per token
Runs locally on consumer GPUs with quantization

Start Chatting View architecture details

35B capacity, 3B inference cost - runs on consumer GPUs

Open Weight

Apache 2.0 - fully open for commercial use and fine-tuning

Qwen 3.6 35B A3B is released under the Apache 2.0 license, enabling unrestricted commercial deployment, fine-tuning, and redistribution. Download weights from HuggingFace and deploy on your own infrastructure with full control.