Modelos Relacionados
Benchmarks
MMLU-Pro
75.0%
GPQA Diamond
56.1%
HLE
3.8%
LiveCodeBench
42.4%
SciCode
24.8%
TerminalBench Hard
4.5%
MATH-500
88.9%
AIME
40.7%
AIME 2025
30.0%
IFBench
33.7%
Long Context Recall
0.0%
Tau2
31.9%
Média do MercadoMelhor Score