Related Models
Benchmarks
MMLU-Pro
80.6%
GPQA Diamond
77.4%
HLE
10.2%
LiveCodeBench
64.3%
SciCode
36.7%
TerminalBench Hard
6.8%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
89.3%
IFBench
44.6%
Long Context Recall
45.7%
Tau2
26.3%
Market AverageTop Score