Related Models
Benchmarks
MMLU-Pro
58.8%
GPQA Diamond
51.5%
HLE
5.8%
LiveCodeBench
51.6%
SciCode
9.3%
TerminalBench Hard
0.0%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
50.3%
IFBench
23.0%
Long Context Recall
0.0%
Tau2
16.4%
Market AverageTop Score