Skip to main content
Back to Explore

Qwen3.5 9B

Alibaba·Released 2026-02-27
Open Source9B262K ctxApache 2.0Multimodal

About

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

Pricing

Input

$0.10

per 1M tokens

Output

$0.15

per 1M tokens

Blended

$0.11

per 1M tokens

Cheaper than 82% of models. Median price is $0.54/1M tokens.

Cost Calculator

Tokens per day1M
100K100M

Daily

$0.11

Monthly

$3.37

vs. Similar Models

Google: Gemini 3.1 Flash Lite PreviewQ:0.0
$0.56+400%
Qwen3 Max Thinking (Preview)Q:0.0
$2.40+2033%
GLM-4.6 (Reasoning)Q:+0.1
$0.96+756%
Gemma 4 31B (Non-reasoning)Q:-0.2
$0.20+82%

Performance

68

tokens/sec

Faster than 34% of models

0.88

seconds

Faster than 61% of models

30.46

seconds

Faster than 16% of models

Market Median

94 tok/s

28% slower

Median TTFT

1.11s

21% faster

Throughput/Dollar

601

tok/s per $/1M

Speed Comparison

DeepSeek: R1 Distill Llama 70B
67 tok/s-1%
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
68 tok/s+1%
GPT-5.2 (Non-reasoning)
67 tok/s-1%

Context Window

262K

tokens

Larger than 62% of models

Max Output

262K

tokens

100% of context

Benchmarks

MMLU-ProNot evaluated
GPQA Diamond
80.6%
HLE
13.3%
LiveCodeBenchNot evaluated
SciCode
27.5%
TerminalBench Hard
24.2%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025Not evaluated
IFBench
66.7%
Long Context Recall
59.0%
Tau2
86.8%
Market AverageTop Score
apache-2.09BGGUF / GPTQ / AWQ
Downloads

9.2M

Likes

1.6K

VRAM (FP16)

16-24 GB

GPU

RTX 4090 / M2 Max

Quick Compare

Similar Models

Compare all 7 models