Related Models
Benchmarks
MMLU-Pro
67.2%
GPQA Diamond
51.7%
HLE
4.7%
LiveCodeBench
37.7%
SciCode
18.1%
TerminalBench Hard
4.5%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
52.3%
IFBench
33.5%
Long Context Recall
7.3%
Tau2
26.6%
Market AverageTop Score