Related Models
Benchmarks
MMLU-Pro
44.7%
GPQA Diamond
33.6%
HLE
5.1%
LiveCodeBench
18.0%
SciCode
11.9%
TerminalBench Hard
1.5%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
6.0%
IFBench
24.8%
Long Context Recall
4.0%
Tau2
12.6%
Market AverageTop Score