Loading...
Loading...
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
Input
$0.10
per 1M tokens
Output
$0.42
per 1M tokens
Blended
$0.18
per 1M tokens
Cheaper than 72% of models. Median price is $0.56/1M tokens.
Daily
$0.18
Monthly
$5.46
75
tokens/sec
Faster than 43% of models
1.13
seconds
Faster than 48% of models
1.13
seconds
Faster than 64% of models
Market Median
86 tok/s
13% slower
Median TTFT
1.07s
5% slower
Throughput/Dollar
412
tok/s per $/1M
Speed Comparison
Context Window
131K
tokens
Larger than 33% of models
Max Output
33K
tokens
25% of context
1.5M
199
24-48 GB
A6000 / M3 Ultra
Quality Index
17.2
285th of 507
Top 56%
Coding Index
15.6
241st of 417
Top 58%
Math Index
68.3
100th of 269
Top 37%
Price/1M
$0.18
180th cheapest
68% below median
Top 28%
Speed
75 tok/s
Top 57%
TTFT
1.13s
Context Window
131K
201st largest
Top 67%