Related Models
Benchmarks
MMLU-Pro
40.6%
GPQA Diamond
32.1%
HLE
4.7%
LiveCodeBench
9.8%
SciCode
11.8%
TerminalBench HardNot evaluated
MATH-500
32.9%
AIME
1.7%
AIME 2025Not evaluated
IFBenchNot evaluated
Long Context RecallNot evaluated
Tau2Not evaluated
Market AverageTop Score