Related Models
Benchmarks
MMLU-Pro
51.1%
GPQA Diamond
32.8%
HLE
3.7%
LiveCodeBench
6.8%
SciCode
8.0%
TerminalBench Hard
0.0%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
3.3%
IFBench
38.1%
Long Context Recall
0.0%
Tau2
0.0%
Market AverageTop Score