Related Models
Benchmarks
MMLU-Pro
62.2%
GPQA Diamond
37.1%
HLE
3.7%
LiveCodeBench
15.9%
SciCode
22.9%
TerminalBench HardNot evaluated
MATH-500
70.1%
AIME
14.7%
AIME 2025Not evaluated
IFBenchNot evaluated
Long Context RecallNot evaluated
Tau2Not evaluated
Market AverageTop Score