Related Models
Benchmarks
MMLU-Pro
80.3%
GPQA Diamond
65.5%
HLE
5.0%
LiveCodeBench
42.5%
SciCode
36.6%
TerminalBench HardNot evaluated
MATH-500
89.3%
AIME
32.7%
AIME 2025
25.7%
IFBenchNot evaluated
Long Context RecallNot evaluated
Tau2Not evaluated
Market AverageTop Score