Related Models
Benchmarks
MMLU-Pro
77.3%
GPQA Diamond
51.1%
HLE
3.7%
LiveCodeBenchNot evaluated
SciCode
33.4%
TerminalBench HardNot evaluated
MATH-500
79.7%
AIME
10.3%
AIME 2025Not evaluated
IFBenchNot evaluated
Long Context Recall
53.0%
Tau2Not evaluated
Market AverageTop Score