Related Models
Performance
131
tokens/sec
Faster than 66% of models
0.42
seconds
Faster than 94% of models
0.42
seconds
Faster than 96% of models
Market Median
95 tok/s
39% faster
Median TTFT
1.11s
62% faster
Speed Comparison
MoonshotAI: Kimi K2 Thinking
132 tok/s+0%
Tiny Aya Global
133 tok/s+1%
IBM: Granite 4.1 8B
129 tok/s-2%
Benchmarks
MMLU-ProNot evaluated
GPQA Diamond
42.5%
HLE
4.4%
LiveCodeBenchNot evaluated
SciCode
13.3%
TerminalBench Hard
0.0%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025Not evaluated
IFBench
26.9%
Long Context Recall
0.0%
Tau2
0.0%
Market AverageTop Score
Open Source
apache-2.08BGGUF / GPTQ / AWQ
Downloads
669.2K
Likes
189
VRAM (FP16)
8-16 GB
GPU
RTX 4070 / M2 Pro