Qwen 3.6 API

OpenAI-compatible API with preserve_thinking - 12x cheaper than Claude Opus 4.6

Access Qwen 3.6 Plus, Max, and open-weight models through an OpenAI-compatible API. DashScope pricing starts at $0.40 per million input tokens and $2.40 per million output tokens for qwen-plus, making it roughly 12x cheaper than Claude Opus 4.6. Drop-in replacement for existing OpenAI integrations with the preserve_thinking parameter for maintaining reasoning state across agent loops. 1M context window with up to 65,536 output tokens. Available via DashScope direct and OpenRouter with a free preview tier.

Start Chatting View API docs

API guide

Integrate Qwen 3.6 into your applications with minimal code changes

The Qwen 3.6 API follows the OpenAI chat completions format, making it a drop-in replacement for existing integrations. The preserve_thinking extension adds agentic capabilities without breaking compatibility. Batch invocation on DashScope runs at 50% of real-time pricing.

OpenAI compatibility

Standard chat completions endpoint with messages, tools, and streaming support. Switch from OpenAI by changing the base URL and API key - no code changes required. Supports function calling, JSON mode, structured outputs, and vision/multimodal inputs. Compatible with Claude Code, OpenClaw, Aider, Continue.dev, and any tool that supports the OpenAI API format. The 1M context window supports up to 65,536 output tokens per request.

DashScope pricing

Direct API access through Alibaba Cloud's DashScope platform with competitive pricing. qwen-plus: $0.40 per million input tokens, $2.40 per million output tokens - roughly 12x cheaper than Claude Opus 4.6 for equivalent tasks. Batch invocation available at 50% of real-time pricing for non-latency-sensitive workloads like data processing, evaluation, and bulk generation. Sign up for an API key at dashscope.aliyuncs.com.

OpenRouter integration

Access Qwen 3.6 models through OpenRouter's unified API alongside 200+ other models. Free preview tier available at qwen/qwen3.6-plus:free with no credit card required. Paid tier uses pass-through pricing plus a 5.5% fee. Single API key for multi-provider access with automatic fallback and load balancing. OpenRouter handles rate limiting and provides usage analytics across all your model providers.

preserve_thinking parameter

First-of-its-kind API extension that maintains the model's internal reasoning state across agent loop iterations. Set preserve_thinking: true in your API request to reduce redundant re-reasoning in multi-step workflows. This improves accuracy and reduces token usage in agentic pipelines by 15-30% on typical multi-step tasks. Essential for building reliable agent loops with Claude Code, OpenClaw, and custom agentic frameworks.

Batch invocation (50% off)

DashScope offers batch invocation at 50% of real-time pricing for workloads that don't require immediate responses. Submit batches of requests and retrieve results asynchronously. Ideal for dataset processing, model evaluation, content generation pipelines, and any workflow where latency is not critical. Batch jobs support the same API format as real-time requests.

1M context window

Qwen 3.6 Plus supports a 1M token context window - enough to process entire codebases, long research papers, legal documents, and extended multi-turn conversations in a single pass. Combined with up to 65,536 output tokens, this enables generating complete files, detailed analyses, and comprehensive reports without truncation. The context window is available on both DashScope and OpenRouter.

SDK and framework support

Works with any OpenAI-compatible SDK: Python (openai), Node.js (openai), Go, Rust, Java, and more. LangChain, LlamaIndex, AutoGen, CrewAI, and Semantic Kernel integrations available out of the box. No custom SDK required - just change the base URL. The DashScope Python SDK also provides native access with additional features like batch management and usage tracking.

Self-hosted API option

For teams that need full data control, deploy Qwen 3.6 open-weight models (27B, 35B A3B) with vLLM, SGLang, or KTransformers to create your own OpenAI-compatible API endpoint. Same API format as DashScope and OpenRouter, so your application code works without changes. Zero per-token costs after hardware investment.

API reference

Quick start with the Qwen 3.6 API

Essential endpoints, pricing, parameters, and configuration for getting started with the Qwen 3.6 API via DashScope or OpenRouter.

Key endpoints and features

POST /v1/chat/completions - Chat completions (streaming supported)
POST /v1/embeddings - Text embeddings
GET /v1/models - List available models
preserve_thinking: true - Enable reasoning state persistence
1M context window, up to 65,536 output tokens
Function calling, JSON mode, structured outputs, vision

Pricing (DashScope)

qwen-plus input: $0.40 per million tokens
qwen-plus output: $2.40 per million tokens
Batch invocation: 50% of real-time pricing
~12x cheaper than Claude Opus 4.6 for equivalent tasks
OpenRouter free tier: qwen/qwen3.6-plus:free (no credit card)
OpenRouter paid: pass-through pricing + 5.5% fee

Available models

qwen-3.6-plus - Flagship, 1M context, preserve_thinking
qwen-3.6-max - Advanced reasoning, multi-modal
qwen-3.6-27b - Dense open-weight, best coding performance
qwen-3.6-35b-a3b - MoE open-weight, cost-effective
Self-hosted via vLLM, SGLang, KTransformers

Start Chatting API documentation

Get started

Start building with the Qwen 3.6 API in minutes

Get your API key and make your first request. The OpenAI-compatible format means you can start with familiar tools and SDKs.

DashScope quickstart

OpenRouter setup

Access Qwen 3.6 through OpenRouter - free tier available

Python SDK guide

Use the standard OpenAI Python SDK with Qwen 3.6

Node.js SDK guide

Integrate Qwen 3.6 into Node.js applications

Batch invocation guide

Submit batch jobs at 50% pricing for bulk workloads

tag

Pricing calculator

Estimate costs for your workload vs Claude, GPT-4o, Gemini

Advanced usage

Build agentic workflows with preserve_thinking and tool calling

Leverage the preserve_thinking parameter, function calling, and 1M context for complex multi-step agent pipelines and production applications.

Agent frameworks

LangChain, AutoGen, CrewAI, and Semantic Kernel integration

Tool calling guide

Function calling, MCP protocol, and structured tool use

Streaming guide

Server-sent events for real-time responses and progress

Claude Code integration

Use Qwen 3.6 as a backend for Claude Code via API

Self-hosted deployment

Deploy with vLLM or SGLang for zero per-token costs

Qwen ecosystem

One API format, multiple access points, industry-leading pricing

Access Qwen 3.6 through DashScope ($0.40/$2.40 per M tokens), OpenRouter (free tier available), or self-hosted vLLM - all using the same OpenAI-compatible API format with preserve_thinking support.

Explore all models API documentation

DashScope

Direct API, $0.40/$2.40 per M tokens

OpenRouter

Unified API with free tier available

Get started

Self-hosted vLLM

Run your own API endpoint, zero per-token cost

Deploy

Python SDK

Standard OpenAI Python library, drop-in replacement

Install

API reference

Complete endpoint and parameter documentation

Read docs

Pricing

Usage-based pricing, batch at 50% off

View pricing

Get started

Ready to integrate Qwen 3.6? Start with the free tier, scale with $0.40/M token pricing

Start chatting for free, then integrate via the OpenAI-compatible API. Drop-in replacement with preserve_thinking for agentic workflows. DashScope at $0.40/$2.40 per million tokens or OpenRouter free tier - no credit card required.

Start Chatting API documentation