Skip to main content
Back to Explore

Hermes 4 - Llama-3.1 405B (Non-reasoning)

Nous Research·Released 2025-08-27
Open Source

Pricing

Input

$1.00

per 1M tokens

Output

$3.00

per 1M tokens

Blended

$1.50

per 1M tokens

Cheaper than 29% of models. Median price is $0.54/1M tokens.

Cost Calculator

Tokens per day1M
100K100M

Daily

$1.50

Monthly

$45.00

vs. Similar Models

Gemini 2.0 Flash-Lite (Feb '25)Q:0.0
$0.13-91%
NVIDIA Nemotron Nano 9B V2 (Reasoning)Q:0.0
$0.07-95%
Qwen3.5 2B (Non-reasoning)Q:0.0
$0.04-97%
Gemma 4 E4B (Non-reasoning)Q:+0.1
$0.54-64%

Performance

40

tokens/sec

Faster than 9% of models

0.81

seconds

Faster than 65% of models

0.81

seconds

Faster than 76% of models

Market Median

94 tok/s

58% slower

Median TTFT

1.10s

27% faster

Throughput/Dollar

27

tok/s per $/1M

Speed Comparison

Devstral 2
40 tok/s+0%
Qwen3.5 4B (Non-reasoning)
40 tok/s-0%
Devstral Small (Jul '25)
40 tok/s+0%

Benchmarks

MMLU-Pro
72.9%
GPQA Diamond
53.6%
HLE
4.2%
LiveCodeBench
54.6%
SciCode
34.6%
TerminalBench Hard
9.8%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
15.3%
IFBench
34.8%
Long Context Recall
20.0%
Tau2
26.6%
Market AverageTop Score

Open Source

Quick Compare

Similar Models

Compare all 7 models