Loading...
Loading...
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.
Quality Index
13.5
299th of 444
Top 68%
Coding Index
6.7
298th of 354
Top 85%
Math Index
14.0
220th of 268
Top 83%
Price/1M
$0.29
334th cheapest
3% below median
Top 49%
Speed
128 tok/s
Top 20%
TTFT
0.45s
Context Window
328K
58th largest
Top 16%
Input
$0.17
per 1M tokens
Output
$0.66
per 1M tokens
Blended
$0.29
per 1M tokens
Cheaper than 51% of models. Median price is $0.30/1M tokens.
Daily
$0.29
Monthly
$8.76
128
tokens/sec
Faster than 80% of models
0.45
seconds
Faster than 49% of models
0.45
seconds
Faster than 51% of models
Market Median
45 tok/s
182% faster
Median TTFT
0.42s
7% slower
Throughput/Dollar
438
tok/s per $/1M
Speed Comparison
Context Window
328K
tokens
Larger than 84% of models
Max Output
16K
tokens
5% of context