Modelos Relacionados
Benchmarks
MMLU-Pro
74.6%
GPQA Diamond
64.1%
HLE
7.2%
LiveCodeBench
51.4%
SciCode
24.1%
TerminalBench Hard
4.5%
MATH-500
96.3%
AIME
71.3%
AIME 2025
41.3%
IFBench
24.8%
Long Context Recall
0.0%
Tau2
26.6%
Média do MercadoMelhor Score