Benchmarks
MMLU-Pro
37.1%
GPQA Diamond
24.0%
HLE
5.1%
LiveCodeBench
3.9%
SciCode
3.6%
TerminalBench Hard
0.0%
MATH-500Não avaliado
AIMENão avaliado
AIME 2025
0.0%
IFBench
19.7%
Long Context Recall
0.0%
Tau2
0.0%
Média do MercadoMelhor Score