Loading...
Loading...
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
Input
$0.08
per 1M tokens
Output
$0.50
per 1M tokens
Blended
$0.18
per 1M tokens
Cheaper than 71% of models. Median price is $0.56/1M tokens.
Daily
$0.18
Monthly
$5.55
147
tokens/sec
Faster than 77% of models
0.94
seconds
Faster than 58% of models
0.94
seconds
Faster than 70% of models
Market Median
86 tok/s
71% faster
Median TTFT
1.07s
12% faster
Throughput/Dollar
794
tok/s per $/1M
Speed Comparison
Context Window
131K
tokens
Larger than 33% of models
Max Output
33K
tokens
25% of context
5.4M
897
8-16 GB
RTX 4070 / M2 Pro
Quality Index
14.3
344th of 507
Top 68%
Coding Index
7.3
354th of 417
Top 85%
Math Index
27.3
190th of 269
Top 71%
Price/1M
$0.18
181st cheapest
67% below median
Top 29%
Speed
147 tok/s
Top 23%
TTFT
0.94s
Context Window
131K
201st largest
Top 67%