Qwen 3.6 API

OpenAI-compatible API with preserve_thinking - 12x cheaper than Claude Opus 4.6

Access Qwen 3.6 Plus, Max, and open-weight models through an OpenAI-compatible API. DashScope pricing starts at $0.40 per million input tokens and $2.40 per million output tokens for qwen-plus, making it roughly 12x cheaper than Claude Opus 4.6. Drop-in replacement for existing OpenAI integrations with the preserve_thinking parameter for maintaining reasoning state across agent loops. 1M context window with up to 65,536 output tokens. Available via DashScope direct and OpenRouter with a free preview tier.

API guide

Integrate Qwen 3.6 into your applications with minimal code changes

The Qwen 3.6 API follows the OpenAI chat completions format, making it a drop-in replacement for existing integrations. The preserve_thinking extension adds agentic capabilities without breaking compatibility. Batch invocation on DashScope runs at 50% of real-time pricing.

OpenAI compatibility

Standard chat completions endpoint with messages, tools, and streaming support. Switch from OpenAI by changing the base URL and API key - no code changes required. Supports function calling, JSON mode, structured outputs, and vision/multimodal inputs. Compatible with Claude Code, OpenClaw, Aider, Continue.dev, and any tool that supports the OpenAI API format. The 1M context window supports up to 65,536 output tokens per request.

DashScope pricing

Direct API access through Alibaba Cloud's DashScope platform with competitive pricing. qwen-plus: $0.40 per million input tokens, $2.40 per million output tokens - roughly 12x cheaper than Claude Opus 4.6 for equivalent tasks. Batch invocation available at 50% of real-time pricing for non-latency-sensitive workloads like data processing, evaluation, and bulk generation. Sign up for an API key at dashscope.aliyuncs.com.

OpenRouter integration

Access Qwen 3.6 models through OpenRouter's unified API alongside 200+ other models. Free preview tier available at qwen/qwen3.6-plus:free with no credit card required. Paid tier uses pass-through pricing plus a 5.5% fee. Single API key for multi-provider access with automatic fallback and load balancing. OpenRouter handles rate limiting and provides usage analytics across all your model providers.

preserve_thinking parameter

First-of-its-kind API extension that maintains the model's internal reasoning state across agent loop iterations. Set preserve_thinking: true in your API request to reduce redundant re-reasoning in multi-step workflows. This improves accuracy and reduces token usage in agentic pipelines by 15-30% on typical multi-step tasks. Essential for building reliable agent loops with Claude Code, OpenClaw, and custom agentic frameworks.

Batch invocation (50% off)

DashScope offers batch invocation at 50% of real-time pricing for workloads that don't require immediate responses. Submit batches of requests and retrieve results asynchronously. Ideal for dataset processing, model evaluation, content generation pipelines, and any workflow where latency is not critical. Batch jobs support the same API format as real-time requests.

1M context window

Qwen 3.6 Plus supports a 1M token context window - enough to process entire codebases, long research papers, legal documents, and extended multi-turn conversations in a single pass. Combined with up to 65,536 output tokens, this enables generating complete files, detailed analyses, and comprehensive reports without truncation. The context window is available on both DashScope and OpenRouter.

SDK and framework support

Works with any OpenAI-compatible SDK: Python (openai), Node.js (openai), Go, Rust, Java, and more. LangChain, LlamaIndex, AutoGen, CrewAI, and Semantic Kernel integrations available out of the box. No custom SDK required - just change the base URL. The DashScope Python SDK also provides native access with additional features like batch management and usage tracking.

Self-hosted API option

For teams that need full data control, deploy Qwen 3.6 open-weight models (27B, 35B A3B) with vLLM, SGLang, or KTransformers to create your own OpenAI-compatible API endpoint. Same API format as DashScope and OpenRouter, so your application code works without changes. Zero per-token costs after hardware investment.

API reference

Quick start with the Qwen 3.6 API

Essential endpoints, pricing, parameters, and configuration for getting started with the Qwen 3.6 API via DashScope or OpenRouter.

Key endpoints and features

  • POST /v1/chat/completions - Chat completions (streaming supported)
  • POST /v1/embeddings - Text embeddings
  • GET /v1/models - List available models
  • preserve_thinking: true - Enable reasoning state persistence
  • 1M context window, up to 65,536 output tokens
  • Function calling, JSON mode, structured outputs, vision

Pricing (DashScope)

  • qwen-plus input: $0.40 per million tokens
  • qwen-plus output: $2.40 per million tokens
  • Batch invocation: 50% of real-time pricing
  • ~12x cheaper than Claude Opus 4.6 for equivalent tasks
  • OpenRouter free tier: qwen/qwen3.6-plus:free (no credit card)
  • OpenRouter paid: pass-through pricing + 5.5% fee

Available models

  • qwen-3.6-plus - Flagship, 1M context, preserve_thinking
  • qwen-3.6-max - Advanced reasoning, multi-modal
  • qwen-3.6-27b - Dense open-weight, best coding performance
  • qwen-3.6-35b-a3b - MoE open-weight, cost-effective
  • Self-hosted via vLLM, SGLang, KTransformers

Qwen ecosystem

One API format, multiple access points, industry-leading pricing

Access Qwen 3.6 through DashScope ($0.40/$2.40 per M tokens), OpenRouter (free tier available), or self-hosted vLLM - all using the same OpenAI-compatible API format with preserve_thinking support.

DashScope

Direct API, $0.40/$2.40 per M tokens

Sign up

OpenRouter

Unified API with free tier available

Get started

Self-hosted vLLM

Run your own API endpoint, zero per-token cost

Deploy

Python SDK

Standard OpenAI Python library, drop-in replacement

Install

API reference

Complete endpoint and parameter documentation

Read docs

Pricing

Usage-based pricing, batch at 50% off

View pricing

Get started

Ready to integrate Qwen 3.6? Start with the free tier, scale with $0.40/M token pricing

Start chatting for free, then integrate via the OpenAI-compatible API. Drop-in replacement with preserve_thinking for agentic workflows. DashScope at $0.40/$2.40 per million tokens or OpenRouter free tier - no credit card required.