Qwen: Qwen3 VL 32B Instruct
About
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
Related Models
Pricing
Input
$0.10
per 1M tokens
Output
$0.42
per 1M tokens
Blended
$0.18
per 1M tokens
Cheaper than 71% of models. Median price is $0.54/1M tokens.
Cost Calculator
Daily
$0.18
Monthly
$5.46
vs. Similar Models
Performance
72
tokens/sec
Faster than 37% of models
1.10
seconds
Faster than 52% of models
1.10
seconds
Faster than 66% of models
Market Median
94 tok/s
24% slower
Median TTFT
1.11s
1% faster
Throughput/Dollar
395
tok/s per $/1M
Speed Comparison
Context Window
262K
tokens
Larger than 62% of models
Max Output
33K
tokens
13% of context
Benchmarks
Open Source
5.4M
207
24-48 GB
A6000 / M3 Ultra