Related Models
Benchmarks
MMLU-Pro
74.3%
GPQA Diamond
66.7%
HLE
5.9%
LiveCodeBench
64.1%
SciCode
25.6%
TerminalBench Hard
1.5%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
82.7%
IFBench
49.8%
Long Context Recall
37.7%
Tau2
25.4%
Market AverageTop Score