Loading...
Loading...
GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.
Quality Index
17.1
230th of 444
Top 52%
Coding Index
11.1
247th of 354
Top 70%
Math Index
26.3
193rd of 268
Top 72%
Price/1M
$0.45
383rd cheapest
50% above median
Top 56%
Speed
20 tok/s
Top 60%
TTFT
5.89s
Context Window
131K
145th largest
Top 63%
Input
$0.30
per 1M tokens
Output
$0.90
per 1M tokens
Blended
$0.45
per 1M tokens
Cheaper than 44% of models. Median price is $0.30/1M tokens.
Daily
$0.45
Monthly
$13.50
20
tokens/sec
Faster than 40% of models
5.89
seconds
Faster than 10% of models
5.89
seconds
Faster than 26% of models
Market Median
45 tok/s
55% slower
Median TTFT
0.42s
1308% slower
Throughput/Dollar
45
tok/s per $/1M
Speed Comparison
Context Window
131K
tokens
Larger than 37% of models
Max Output
131K
tokens
100% of context