Related Models
Benchmarks
MMLU-Pro
70.9%
GPQA Diamond
51.0%
HLE
3.8%
LiveCodeBench
26.7%
SciCode
28.5%
TerminalBench HardNot evaluated
MATH-500
77.8%
AIME
13.3%
AIME 2025Not evaluated
IFBenchNot evaluated
Long Context RecallNot evaluated
Tau2Not evaluated
Market AverageTop Score