Qwen 3.6 API
OpenAI-compatible API with preserve_thinking - 12x cheaper than Claude Opus 4.6
Access Qwen 3.6 Plus, Max, and open-weight models through an OpenAI-compatible API. DashScope pricing starts at $0.40 per million input tokens and $2.40 per million output tokens for qwen-plus, making it roughly 12x cheaper than Claude Opus 4.6. Drop-in replacement for existing OpenAI integrations with the preserve_thinking parameter for maintaining reasoning state across agent loops. 1M context window with up to 65,536 output tokens. Available via DashScope direct and OpenRouter with a free preview tier.
API guide
Integrate Qwen 3.6 into your applications with minimal code changes
The Qwen 3.6 API follows the OpenAI chat completions format, making it a drop-in replacement for existing integrations. The preserve_thinking extension adds agentic capabilities without breaking compatibility. Batch invocation on DashScope runs at 50% of real-time pricing.
OpenAI compatibility
Standard chat completions endpoint with messages, tools, and streaming support. Switch from OpenAI by changing the base URL and API key - no code changes required. Supports function calling, JSON mode, structured outputs, and vision/multimodal inputs. Compatible with Claude Code, OpenClaw, Aider, Continue.dev, and any tool that supports the OpenAI API format. The 1M context window supports up to 65,536 output tokens per request.
DashScope pricing
Direct API access through Alibaba Cloud's DashScope platform with competitive pricing. qwen-plus: $0.40 per million input tokens, $2.40 per million output tokens - roughly 12x cheaper than Claude Opus 4.6 for equivalent tasks. Batch invocation available at 50% of real-time pricing for non-latency-sensitive workloads like data processing, evaluation, and bulk generation. Sign up for an API key at dashscope.aliyuncs.com.
OpenRouter integration
Access Qwen 3.6 models through OpenRouter's unified API alongside 200+ other models. Free preview tier available at qwen/qwen3.6-plus:free with no credit card required. Paid tier uses pass-through pricing plus a 5.5% fee. Single API key for multi-provider access with automatic fallback and load balancing. OpenRouter handles rate limiting and provides usage analytics across all your model providers.
preserve_thinking parameter
First-of-its-kind API extension that maintains the model's internal reasoning state across agent loop iterations. Set preserve_thinking: true in your API request to reduce redundant re-reasoning in multi-step workflows. This improves accuracy and reduces token usage in agentic pipelines by 15-30% on typical multi-step tasks. Essential for building reliable agent loops with Claude Code, OpenClaw, and custom agentic frameworks.
Batch invocation (50% off)
DashScope offers batch invocation at 50% of real-time pricing for workloads that don't require immediate responses. Submit batches of requests and retrieve results asynchronously. Ideal for dataset processing, model evaluation, content generation pipelines, and any workflow where latency is not critical. Batch jobs support the same API format as real-time requests.
1M context window
Qwen 3.6 Plus supports a 1M token context window - enough to process entire codebases, long research papers, legal documents, and extended multi-turn conversations in a single pass. Combined with up to 65,536 output tokens, this enables generating complete files, detailed analyses, and comprehensive reports without truncation. The context window is available on both DashScope and OpenRouter.
SDK and framework support
Works with any OpenAI-compatible SDK: Python (openai), Node.js (openai), Go, Rust, Java, and more. LangChain, LlamaIndex, AutoGen, CrewAI, and Semantic Kernel integrations available out of the box. No custom SDK required - just change the base URL. The DashScope Python SDK also provides native access with additional features like batch management and usage tracking.
Self-hosted API option
For teams that need full data control, deploy Qwen 3.6 open-weight models (27B, 35B A3B) with vLLM, SGLang, or KTransformers to create your own OpenAI-compatible API endpoint. Same API format as DashScope and OpenRouter, so your application code works without changes. Zero per-token costs after hardware investment.
API reference
Quick start with the Qwen 3.6 API
Essential endpoints, pricing, parameters, and configuration for getting started with the Qwen 3.6 API via DashScope or OpenRouter.
Key endpoints and features
- POST /v1/chat/completions - Chat completions (streaming supported)
- POST /v1/embeddings - Text embeddings
- GET /v1/models - List available models
- preserve_thinking: true - Enable reasoning state persistence
- 1M context window, up to 65,536 output tokens
- Function calling, JSON mode, structured outputs, vision
Pricing (DashScope)
- qwen-plus input: $0.40 per million tokens
- qwen-plus output: $2.40 per million tokens
- Batch invocation: 50% of real-time pricing
- ~12x cheaper than Claude Opus 4.6 for equivalent tasks
- OpenRouter free tier: qwen/qwen3.6-plus:free (no credit card)
- OpenRouter paid: pass-through pricing + 5.5% fee
Available models
- qwen-3.6-plus - Flagship, 1M context, preserve_thinking
- qwen-3.6-max - Advanced reasoning, multi-modal
- qwen-3.6-27b - Dense open-weight, best coding performance
- qwen-3.6-35b-a3b - MoE open-weight, cost-effective
- Self-hosted via vLLM, SGLang, KTransformers
Get started
Start building with the Qwen 3.6 API in minutes
Get your API key and make your first request. The OpenAI-compatible format means you can start with familiar tools and SDKs.
Sign up and get your API key from Alibaba Cloud DashScope
Access Qwen 3.6 through OpenRouter - free tier available
Use the standard OpenAI Python SDK with Qwen 3.6
Integrate Qwen 3.6 into Node.js applications
Submit batch jobs at 50% pricing for bulk workloads
Estimate costs for your workload vs Claude, GPT-4o, Gemini
Advanced usage
Build agentic workflows with preserve_thinking and tool calling
Leverage the preserve_thinking parameter, function calling, and 1M context for complex multi-step agent pipelines and production applications.
LangChain, AutoGen, CrewAI, and Semantic Kernel integration
Function calling, MCP protocol, and structured tool use
Server-sent events for real-time responses and progress
Use Qwen 3.6 as a backend for Claude Code via API
Deploy with vLLM or SGLang for zero per-token costs
Qwen ecosystem
One API format, multiple access points, industry-leading pricing
Access Qwen 3.6 through DashScope ($0.40/$2.40 per M tokens), OpenRouter (free tier available), or self-hosted vLLM - all using the same OpenAI-compatible API format with preserve_thinking support.
Get started
Ready to integrate Qwen 3.6? Start with the free tier, scale with $0.40/M token pricing
Start chatting for free, then integrate via the OpenAI-compatible API. Drop-in replacement with preserve_thinking for agentic workflows. DashScope at $0.40/$2.40 per million tokens or OpenRouter free tier - no credit card required.