Related Models
Benchmarks
MMLU-Pro
80.5%
GPQA Diamond
68.7%
HLE
7.0%
LiveCodeBench
61.6%
SciCode
30.2%
TerminalBench Hard
3.0%
MATH-500
96.7%
AIME
69.0%
AIME 2025
61.3%
IFBench
37.1%
Long Context Recall
0.0%
Tau2
28.1%
Market AverageTop Score