Related Models
Benchmarks
MMLU-Pro
43.5%
GPQA Diamond
31.9%
HLE
4.4%
LiveCodeBench
11.6%
SciCode
9.0%
TerminalBench Hard
0.0%
MATH-500
45.7%
AIME
4.0%
AIME 2025
0.3%
IFBench
23.9%
Long Context Recall
2.0%
Tau2
0.0%
Market AverageTop Score