Related Models
Benchmarks
MMLU-Pro
57.7%
GPQA Diamond
33.3%
HLE
4.6%
LiveCodeBench
21.0%
SciCode
5.9%
TerminalBench Hard
0.8%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
10.7%
IFBench
52.4%
Long Context Recall
7.0%
Tau2
15.8%
Market AverageTop Score