Related Models
Benchmarks
MMLU-Pro
12.7%
GPQA Diamond
25.7%
HLE
6.4%
LiveCodeBench
1.9%
SciCode
1.7%
TerminalBench Hard
0.0%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
1.3%
IFBench
17.6%
Long Context Recall
0.0%
Tau2
14.6%
Market AverageTop Score