Related Models
Benchmarks
MMLU-Pro
32.5%
GPQA Diamond
28.1%
HLE
5.1%
LiveCodeBench
4.7%
SciCode
8.7%
TerminalBench Hard
0.0%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
6.3%
IFBench
20.5%
Long Context Recall
4.0%
Tau2
22.8%
Market AverageTop Score