Qwen Code

Agentic coding that resolves real GitHub issues, masters terminal workflows, and beats Claude on SkillsBench

The Qwen 3.6 family delivers elite coding performance across every dimension. The Plus model scores 78.8% on SWE-bench Verified and 61.6 on Terminal-Bench 2.0. The 27B dense model achieves 77.2% SWE-bench, 48.2 on SkillsBench (beating Claude 4.5 Opus at 45.3), and 1487 on QwenWebBench for frontend code generation. The 35B A3B MoE brings 73.4% SWE-bench in a consumer GPU footprint. All models work with Claude Code, OpenClaw, Aider, and Continue.dev via the OpenAI-compatible API. preserve_thinking maintains reasoning state across agent loop iterations for iterative development.

Coding capabilities

Full-stack coding from terminal to production - with thinking preservation

Qwen 3.6 models excel at every stage of the software development lifecycle. From understanding large codebases and generating code to debugging, testing, and deploying through terminal workflows. The preserve_thinking parameter maintains reasoning context across iterative development cycles.

Agentic coding (SWE-bench)

Autonomously resolves real-world GitHub issues end-to-end. 78.8% on SWE-bench Verified (Plus) and 77.2% (27B) demonstrate the ability to navigate repositories, identify root causes, implement fixes, and submit working patches without human intervention. The 35B A3B achieves 73.4% in a consumer GPU footprint. These scores place Qwen 3.6 among the top models for autonomous software engineering.

Frontend code generation (QwenWebBench)

The 27B model scores 1487 on QwenWebBench and the 35B A3B scores 1397, demonstrating strong frontend code generation capabilities. Generates complete React, Vue, and Next.js components with proper TypeScript typing, accessibility attributes, responsive layouts, and design system integration. Handles CSS-in-JS, Tailwind CSS, and component library patterns. The preserve_thinking parameter helps maintain design context across multi-file frontend scaffolding.

Terminal operations (Terminal-Bench)

61.6 on Terminal-Bench 2.0 (Plus) and 59.3 (27B) - expert-level terminal mastery. Handles complex multi-step shell workflows, system administration tasks, debugging sessions, CI/CD pipeline management, Docker orchestration, and infrastructure automation. The 35B A3B scores 51.5, still strong for a consumer GPU model.

SkillsBench - beats Claude 4.5 Opus

The 27B model scores 48.2 on SkillsBench, beating Claude 4.5 Opus at 45.3. SkillsBench evaluates practical coding skills including code review, refactoring, API design, testing strategy, and architectural decision-making. This benchmark measures the kind of nuanced engineering judgment that matters in real-world development, not just code generation.

Repository-level reasoning (NL2Repo)

The 27B model scores 36.2 on NL2Repo, demonstrating the ability to translate natural language descriptions into complete repository structures. Understands cross-file dependencies, module boundaries, architectural patterns, and project conventions across entire repositories. The 1M context window (Plus) enables processing complete codebases in a single pass for comprehensive understanding.

Code generation (LiveCodeBench)

83.9 on LiveCodeBench (27B) and 80.4 (35B A3B) for competitive-grade code generation. Produces clean, idiomatic code across Python, TypeScript, Rust, Go, Java, C++, and 20+ languages with proper error handling, documentation, and test coverage. Handles algorithmic problems, data structure implementations, and system design challenges.

Coding tool integration

Works with Claude Code, OpenClaw, Aider, Continue.dev, and Qwen Code via the OpenAI-compatible API. Set the base URL to your DashScope, OpenRouter, or local Ollama endpoint and start coding immediately. The preserve_thinking parameter is especially valuable in Claude Code and OpenClaw agent loops where maintaining reasoning state across iterations reduces redundant re-reasoning and improves fix accuracy.

Debugging, testing, and Claw-Eval

The 27B model scores 72.4 on Claw-Eval average and the 35B A3B scores 68.7, measuring end-to-end agentic coding capability. Traces bugs through complex call stacks, identifies root causes from error logs, and generates comprehensive test suites. Supports unit tests, integration tests, end-to-end testing frameworks, and property-based testing across all major languages and frameworks.

Coding benchmarks

Top-tier results across every coding evaluation

Qwen 3.6 models consistently rank among the best on software engineering, code generation, terminal operations, and practical coding skill benchmarks.

Software engineering benchmarks

  • SWE-bench Verified: 78.8% (Plus) / 77.2% (27B) / 73.4% (35B A3B)
  • Terminal-Bench 2.0: 61.6 (Plus) / 59.3 (27B) / 51.5 (35B A3B)
  • SkillsBench: 48.2 (27B) - beats Claude 4.5 Opus (45.3)
  • Claw-Eval Avg: 72.4 (27B) / 68.7 (35B A3B)
  • LiveCodeBench: 83.9 (27B) / 80.4 (35B A3B)
  • QwenWebBench: 1487 (27B) / 1397 (35B A3B) - frontend generation
  • NL2Repo: 36.2 (27B) - natural language to repository
  • SWE-bench Pro: 56.6 (Plus)

Tool and model options

  • Works with: Claude Code, OpenClaw, Aider, Continue.dev, Qwen Code
  • 27B Dense: Best open-weight coding, 77.2% SWE-bench
  • 35B A3B MoE: 73.4% SWE-bench on consumer GPU (~21GB VRAM)
  • Plus: 78.8% SWE-bench, 1M context, preserve_thinking
  • Frontend: React, Vue, Next.js with TypeScript support
  • preserve_thinking: maintains reasoning across agent iterations

Qwen ecosystem

Coding models for every scale - from consumer GPU to frontier performance

From the 35B A3B that runs on a single consumer GPU to the Plus with 1M context and preserve_thinking, the Qwen 3.6 family covers every coding deployment scenario. All models work with Claude Code, OpenClaw, Aider, and Continue.dev.

Qwen 3.6 27B

Dense, 77.2% SWE-bench, 48.2 SkillsBench

Learn more

Qwen 3.6 35B A3B

MoE, 73.4% SWE-bench, consumer GPU

Learn more

Qwen 3.6 Plus

78.8% SWE-bench, 1M context, preserve_thinking

Learn more

Ollama setup

Run Qwen Code locally in one command

Get started

API reference

OpenAI-compatible endpoints for coding tasks

View API

Community

Join the Qwen developer community

Join

Start coding

Ready to code with Qwen 3.6? 78.8% SWE-bench, works with your favorite tools

Start chatting for free or integrate via the OpenAI-compatible API. Works with Claude Code, OpenClaw, Aider, and Continue.dev. Choose from open-weight models you can run locally or the Plus for maximum performance with preserve_thinking.