Related Models
Benchmarks
MMLU-Pro
76.8%
GPQA Diamond
62.8%
HLE
4.9%
LiveCodeBench
47.2%
SciCode
25.2%
TerminalBench Hard
1.5%
MATH-500
93.9%
AIME
47.0%
AIME 2025
39.3%
IFBench
33.5%
Long Context Recall
8.0%
Tau2
4.1%
Market AverageTop Score