Related Models
Benchmarks
MMLU-Pro
28.2%
GPQA Diamond
28.8%
HLE
5.5%
LiveCodeBench
4.1%
SciCode
3.7%
TerminalBench Hard
0.0%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
0.7%
IFBench
24.4%
Long Context Recall
0.0%
Tau2
0.0%
Market AverageTop Score