Related Models
Benchmarks
MMLU-Pro
80.8%
GPQA Diamond
68.2%
HLE
7.5%
LiveCodeBench
65.7%
SciCode
37.8%
TerminalBench Hard
2.3%
MATH-500
97.2%
AIME
81.3%
AIME 2025
13.7%
IFBench
41.2%
Long Context Recall
51.7%
Tau2
31.6%
Market AverageTop Score