Related Models
Benchmarks
MMLU-Pro
43.1%
GPQA Diamond
27.7%
HLE
4.6%
LiveCodeBench
11.6%
SciCode
11.7%
TerminalBench HardNot evaluated
MATH-500
40.3%
AIME
0.7%
AIME 2025Not evaluated
IFBenchNot evaluated
Long Context RecallNot evaluated
Tau2Not evaluated
Market AverageTop Score