Modelos Relacionados
Benchmarks
MMLU-Pro
81.8%
GPQA Diamond
73.9%
HLE
10.5%
LiveCodeBench
74.7%
SciCode
34.4%
TerminalBench Hard
3.8%
MATH-500
97.7%
AIME
84.3%
AIME 2025
80.0%
IFBench
36.3%
Long Context Recall
14.0%
Tau2
17.3%
Média do MercadoMelhor Score