Modelos Relacionados
Benchmarks
MMLU-Pro
75.3%
GPQA Diamond
67.9%
HLE
9.5%
LiveCodeBench
52.7%
SciCode
29.7%
TerminalBench Hard
9.1%
MATH-500
91.7%
AIME
70.0%
AIME 2025
40.3%
IFBench
25.1%
Long Context Recall
0.0%
Tau2
23.1%
Média do MercadoMelhor Score