Related Models
Benchmarks
MMLU-Pro
83.0%
GPQA Diamond
77.7%
HLE
12.7%
LiveCodeBench
81.2%
SciCode
37.5%
TerminalBench Hard
25.0%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
85.0%
IFBench
41.4%
Long Context Recall
6.7%
Tau2
83.9%
Market AverageTop Score