Qwen 3.6 35B A3B

35 billion parameters, 3 billion active - frontier MoE on consumer hardware

Qwen 3.6 35B A3B is a Mixture-of-Experts model that activates only 3B parameters per token from 256 experts. With 73.4% on SWE-bench Verified, 92.7% on AIME 2026, and Apache 2.0 licensing, it brings frontier-class coding and reasoning to consumer GPUs.

Model variants

Open-weight MoE for local and cloud deployment

Qwen 3.6 35B A3B delivers strong performance with minimal active parameters. Choose the instruction-tuned variant for chat and coding, or the base model for fine-tuning.

Mixture-of-Experts Architecture

35B total parameters, 3B active per token, 256 experts

Qwen 3.6 35B A3B uses a Hybrid Gated DeltaNet + Gated Attention + MoE design with 256 experts, routing 8 experts plus 1 shared expert per token. The 262K native context is extensible to 1M tokens, and the Apache 2.0 license enables unrestricted commercial use.

With only 3B active parameters per token, this model runs efficiently on consumer GPUs while delivering performance that rivals much larger dense models.

Instruction-tuned

35B A3B Instruct

Optimized for conversational AI, coding, and agentic tasks on consumer hardware

Fine-tuned for instruction following and multi-turn dialogue with MoE efficiency

Available now - Apache 2.0

Pre-trained

35B A3B Base

Foundation MoE model for fine-tuning and specialized applications

Pre-trained with 256-expert MoE routing on diverse data

Available now - Apache 2.0

Capabilities

256 experts, 3B active - maximum efficiency meets strong performance

Qwen 3.6 35B A3B combines a massive expert pool with minimal active compute to deliver impressive coding, reasoning, and agentic capabilities on consumer-grade hardware.

Real-world software engineering

73.4% on SWE-bench Verified - resolving real GitHub issues with only 3B active parameters per token. Competitive with models that use 10x more compute at inference time.

Terminal operations

51.5 on Terminal-Bench 2.0 for complex multi-step terminal workflows. Handles debugging, system administration, and build pipeline tasks with strong proficiency.

Advanced mathematics

92.7% on AIME 2026 - near-frontier math reasoning from a model that runs on consumer GPUs. Step-by-step thinking mode enables transparent problem solving.

262K to 1M context

262K native context window extensible to 1M tokens. Analyze entire codebases, long documents, and complex multi-turn conversations without truncation.

Competitive coding

80.4 on LiveCodeBench v6 for algorithmic problem solving. Strong code generation, debugging, and refactoring capabilities across multiple programming languages.

Open-weight freedom

Apache 2.0 license enables unrestricted commercial use, fine-tuning, and redistribution. Full transparency into model weights for research and customization.

Key highlights

Frontier MoE performance on consumer hardware

Qwen 3.6 35B A3B achieves strong results across coding, reasoning, and agentic benchmarks while activating only 3B parameters per token.

Top achievements

  • SWE-bench Verified: 73.4% - real-world software engineering
  • Terminal-Bench 2.0: 51.5 - complex terminal operations
  • AIME 2026: 92.7% - advanced mathematics
  • LiveCodeBench v6: 80.4 - competitive coding
  • Apache 2.0 license - fully open-weight

Technical specs

  • 35B total parameters, 3B active per token
  • 256 experts: 8 routed + 1 shared active per token
  • Hybrid Gated DeltaNet + Gated Attention + MoE architecture
  • 262K native context, extensible to 1M tokens
  • Runs locally on consumer GPUs

Performance

Strong MoE performance at 3B active inference cost

Qwen 3.6 35B A3B scores 73.4% on SWE-bench Verified and 92.7% on AIME 2026 while activating only 3B parameters per token - bringing frontier-class capabilities to consumer hardware.

Qwen 3.6 35B A3B demonstrates that sparse MoE architectures with 256 experts can deliver impressive results across software engineering, mathematics, and competitive coding at a fraction of the compute cost.

Qwen 3.6 35B A3B performance comparison chart across coding and reasoning benchmarks

SWE-bench Verified: 73.4% with only 3B active parameters

Terminal-Bench 2.0: 51.5 for terminal operations

AIME 2026: 92.7% on advanced mathematics

LiveCodeBench v6: 80.4 competitive coding

Apache 2.0 open-weight license

Benchmark comparison

Qwen 3.6 35B A3B vs the Qwen 3.6 family and competitors

Qwen 3.6 35B A3B delivers strong performance across software engineering, terminal operations, and reasoning benchmarks at minimal inference cost.

Benchmark
Qwen 3.6 35B A3B
MoE
Featured
Qwen 3.6 27B
Dense
Qwen 3.6 Plus
Proprietary
Qwen 3 235B A22B
MoE
SWE-bench Verified
Real-world software engineering
73.4%77.2%78.8%76.2%
Terminal-Bench 2.0
Terminal operations
51.559.361.6-
AIME 2026
Mathematics
No tools
92.7%94.1%--
LiveCodeBench v6
Competitive coding
80.483.9--

Benchmark results from official Qwen 3.6 model card and HuggingFace evaluations.

256-Expert MoE

35B capacity, 3B inference cost - runs on consumer GPUs

The Mixture-of-Experts design routes each token through 8 of 256 experts plus 1 shared expert. All 35B parameters load for routing diversity, but only 3B activate per forward pass. Combined with the Hybrid Gated DeltaNet + Gated Attention architecture, this enables consumer-GPU deployment with strong performance.

  • 3B active parameters per token from 35B total capacity
  • 256 experts: 8 routed + 1 shared active per token
  • Runs locally on consumer GPUs with quantization
35B capacity, 3B inference cost - runs on consumer GPUs

Open Weight

Apache 2.0 - fully open for commercial use and fine-tuning

Qwen 3.6 35B A3B is released under the Apache 2.0 license, enabling unrestricted commercial deployment, fine-tuning, and redistribution. Download weights from HuggingFace and deploy on your own infrastructure with full control.

  • Apache 2.0 license - no usage restrictions
  • Full weight access for fine-tuning and customization
  • Community-driven ecosystem with broad framework support

Qwen ecosystem

Part of the Qwen 3.6 model family

Qwen 3.6 35B A3B is the open-weight MoE variant in Alibaba's latest model family, designed for maximum accessibility on consumer hardware.

Documentation

Complete guides for integration and deployment

Read docs

HuggingFace

Download Apache 2.0 weights and explore the model hub

Download

Model Card

Technical specifications and evaluation results

View details

GitHub Repository

Source code, examples, and community contributions

View code

API Access

OpenAI-compatible API endpoints for cloud deployment

Get started

Community

Join the Qwen developer community

Join

Get started

Ready to build with Qwen 3.6 35B A3B?

Start chatting instantly for free, or download open-weight models under Apache 2.0 for self-hosted deployment on consumer hardware.