Skip to main content
Back to Explore

Hermes 4 - Llama-3.1 70B (Non-reasoning)

Nous Research·Released 2025-08-27
Open Source

Pricing

Input

$0.13

per 1M tokens

Output

$0.40

per 1M tokens

Blended

$0.20

per 1M tokens

Cheaper than 70% of models. Median price is $0.54/1M tokens.

Cost Calculator

Tokens per day1M
100K100M

Daily

$0.20

Monthly

$5.94

vs. Similar Models

Google: Gemini 2.5 Flash LiteQ:0.0
$0.17-12%
Mistral Small 3Q:0.0
$0.10-47%
Nova LiteQ:0.0
$0.10-47%
OpenAI: GPT-4o-miniQ:0.0
$0.26+33%

Performance

91

tokens/sec

Faster than 49% of models

0.63

seconds

Faster than 75% of models

0.63

seconds

Faster than 83% of models

Market Median

94 tok/s

4% slower

Median TTFT

1.11s

44% faster

Throughput/Dollar

459

tok/s per $/1M

Speed Comparison

MiMo-V2-Flash (Reasoning)
91 tok/s+0%
Hermes 4 - Llama-3.1 70B (Reasoning)
90 tok/s-0%
Llama 3.3 Instruct 70B
92 tok/s+1%

Benchmarks

MMLU-Pro
66.4%
GPQA Diamond
49.1%
HLE
3.6%
LiveCodeBench
26.9%
SciCode
27.7%
TerminalBench Hard
0.0%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025
11.3%
IFBench
29.0%
Long Context Recall
2.0%
Tau2
21.6%
Market AverageTop Score

Open Source

Quick Compare

Similar Models

Compare all 7 models