Qwen 3.6 + Ollama

Run Qwen 3.6 locally with a single command - no configuration required

Ollama makes running Qwen 3.6 as simple as 'ollama run qwen3.6:35b-a3b'. Automatic GPU detection, model downloading, and quantization selection. Supports both the 27B dense and 35B A3B MoE models with NVIDIA CUDA and Apple Metal acceleration. Expect 20-40 tokens per second on consumer hardware for the 35B A3B 4-bit model. The OpenAI-compatible API at localhost:11434 integrates directly with Claude Code, Aider, Continue.dev, and other coding tools. Vision and multimodal input supported out of the box - a key fix over Qwen 3.5 where vision and tool calling were broken.

Start Chatting View model tags

Ollama guide

From install to inference in under 5 minutes

Ollama handles the complexity of local model deployment - GPU detection, memory management, quantization, and API serving - so you can focus on using the model. Qwen 3.6 fixes the vision and tool calling issues that plagued Qwen 3.5 on Ollama.

One-command setup

Install Ollama, then run 'ollama run qwen3.6:35b-a3b' (default tag) or 'ollama run qwen3.6:27b'. Automatic model download, GPU detection, and optimal quantization selection. Works on macOS (Apple Silicon with Metal), Linux (NVIDIA CUDA), and Windows (WSL2 or native). The 35B A3B is the default recommended model for most users due to its balance of quality and hardware requirements.

Model tag selection

Choose the right model variant: 'qwen3.6:35b-a3b' for consumer GPUs (default tag), 'qwen3.6:27b' for maximum performance on workstation hardware, 'qwen3.6:35b-a3b-q4_k_m' for specific quantization control, or 'qwen3.6:35b-a3b-q3_k_m' for tighter VRAM budgets (~17GB). Tags map directly to GGUF quantization levels. Use 'ollama list' to see downloaded models and 'ollama show qwen3.6:35b-a3b' to inspect model details.

VRAM requirements and quantization

35B A3B quantization options: Q2_K (~13GB, fastest, lowest quality), Q3_K_M (~17GB, good for Mac M4 16GB), Q4_K_M (~21GB, balanced quality/speed on 24GB GPU), Q5_K_M (~25GB), Q8_0 (~35GB, near-lossless). 27B dense: Q4_K_M ~16GB, needs 24GB+ GPU. BF16 full precision for 35B A3B requires ~70GB VRAM. Community reports confirm Mac M4 16GB runs the 35B A3B at Q3 quantization successfully.

Vision and multimodal support

Qwen 3.6 models support multi-modal inputs through Ollama - a major improvement over Qwen 3.5 where vision was broken. Pass images alongside text prompts for code screenshot analysis, UI review, diagram understanding, architecture diagram parsing, and visual debugging workflows. Use the /image command in Ollama chat or pass base64-encoded images via the API.

Performance benchmarks on consumer hardware

Unsloth community benchmarks show 20-40 tokens per second on local rigs for the 35B A3B 4-bit model. Mac M4 16GB users report usable speeds with Q3 quantization. RTX 4090 24GB handles Q4_K_M with room for context. RTX 6000 96GB can run full precision deployment. Performance scales linearly with GPU memory bandwidth - faster memory means faster inference.

Modelfile customization

Create custom Modelfiles to configure system prompts, temperature, context length (num_ctx), GPU layer offloading (num_gpu), batch size (num_batch), and thread count. Set num_ctx up to 131072 for long-context tasks. Customize the chat template for specific use cases like coding assistants, technical writing, or agentic workflows. Modelfiles are plain text and version-controllable.

Tool calling and function support

Qwen 3.6 on Ollama supports tool calling and function invocation - another fix over Qwen 3.5 where tool calling was broken. Define tools in the OpenAI-compatible format and the model will generate structured function calls. This enables integration with agentic frameworks like LangChain, AutoGen, and CrewAI through the localhost:11434 endpoint.

Coding tool integration

Ollama exposes an OpenAI-compatible API at localhost:11434. Connect directly to Claude Code (via OpenAI-compatible API), OpenClaw, Aider, Continue.dev, Cursor, and other coding tools that support custom OpenAI endpoints. Set the base URL to http://localhost:11434/v1 and use any string as the API key. The Qwen 3.6 models support the same chat completions format as OpenAI.

Quick reference

Ollama commands, model tags, and hardware requirements

Essential commands, configuration options, and hardware requirements for running Qwen 3.6 with Ollama on different platforms.

Essential commands

ollama run qwen3.6:35b-a3b - Run MoE model (default tag, consumer GPU)
ollama run qwen3.6:27b - Run dense model (workstation GPU)
ollama pull qwen3.6:35b-a3b-q3_k_m - Download Q3 quant (~17GB, Mac M4 friendly)
ollama pull qwen3.6:35b-a3b-q4_k_m - Download Q4 quant (~21GB, balanced)
ollama serve - Start API server on localhost:11434
ollama list - Show downloaded models and sizes
ollama show qwen3.6:35b-a3b - Inspect model details and parameters

Hardware requirements

35B A3B Q3_K_M: ~17GB VRAM (Mac M4 16GB confirmed working)
35B A3B Q4_K_M: ~21GB VRAM (RTX 4090 24GB recommended)
35B A3B BF16: ~70GB VRAM (RTX 6000 96GB or multi-GPU)
27B Dense Q4_K_M: ~16GB VRAM (RTX 4090 24GB minimum)
27B Dense IQ4_XS: fits 16GB VRAM with KV cache compression
macOS: Apple Silicon with Metal acceleration (M1 Pro+ recommended)
20-40 tok/s on consumer hardware for 35B A3B 4-bit
CPU fallback available but significantly slower (~2-5 tok/s)

Fixes over Qwen 3.5

Vision/multimodal input: broken in 3.5, fully working in 3.6
Tool calling/function invocation: broken in 3.5, fixed in 3.6
Improved context handling and memory efficiency
Better quantization quality at lower bit widths

Start Chatting Ollama documentation

Setup guides

Get Qwen 3.6 running with Ollama on any platform

Step-by-step guides for installing Ollama and configuring Qwen 3.6 on your platform, with hardware-specific optimization tips.

macOS setup (Apple Silicon)

Install Ollama and run Qwen 3.6 on M1/M2/M3/M4 Macs with Metal acceleration

Linux setup (NVIDIA)

NVIDIA GPU setup with CUDA acceleration for maximum throughput

Windows setup

WSL2 and native Windows installation with GPU passthrough

Docker setup

Run Ollama in a container with GPU access for reproducible deployments

Mac M4 16GB guide

Run 35B A3B with Q3 quantization on Mac M4 with 16GB RAM

Multi-GPU setup

Split large models across multiple GPUs for better performance

Advanced configuration

Optimize Qwen 3.6 performance and integrate with coding tools

Fine-tune model performance with Modelfiles, GPU configuration, context settings, and connect to your development environment.

Modelfile guide

Custom system prompts, temperature, context length, and chat templates

GPU optimization

VRAM management, layer offloading, and batch size tuning

Claude Code integration

Use Qwen 3.6 via Ollama as a backend for Claude Code

Continue.dev setup

AI coding assistant in VS Code with local Qwen 3.6

Aider integration

AI pair programming with Ollama-hosted Qwen 3.6

API integration

Connect Ollama's localhost:11434 to any OpenAI-compatible tool

Qwen ecosystem

Ollama is the fastest path to local Qwen 3.6 - one command, full capabilities

One-command setup with automatic GPU detection, model management, vision support, tool calling, and an OpenAI-compatible API at localhost:11434 for seamless integration with Claude Code, Aider, Continue.dev, and more.

Explore all models Ollama library

Qwen 3.6 35B A3B

MoE model, 20-40 tok/s on consumer GPU

Run locally

Qwen 3.6 27B

Dense model, maximum local performance

Run locally

Ollama library

Browse all available Qwen model tags and quantizations

Browse

Modelfile reference

Customize model behavior, context, and parameters

Read docs

API reference

OpenAI-compatible API at localhost:11434

View API

Community

Get help from the Ollama and Qwen communities

Join

Get started

Ready to run Qwen 3.6 with Ollama? One command is all you need

Try Qwen 3.6 in the browser first, then install Ollama for local deployment. Run 'ollama run qwen3.6:35b-a3b' to download, configure, and start chatting with 20-40 tok/s on consumer hardware. Vision, tool calling, and coding tool integration all work out of the box.

Start Chatting Install Ollama