Skip to main content
Back to Explore

MiMo-V2-Omni

Xiaomi·Released 2026-03-19
262K ctxMultimodal

About

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Pricing

Input

$0.40

per 1M tokens

Output

$2.00

per 1M tokens

Blended

$0.80

per 1M tokens

Cheaper than 42% of models. Median price is $0.54/1M tokens.

Cost Calculator

Tokens per day1M
100K100M

Daily

$0.80

Monthly

$24.00

vs. Similar Models

Gemini 3.5 Flash (minimal)Q:-0.1
$3.38+322%
Anthropic: Claude Opus 4.5Q:-0.3
$10.00+1150%
Claude 4.5 Sonnet (Reasoning)Q:-0.3
$6.00+650%
OpenAI: GPT-5.1-CodexQ:-0.3
$3.44+330%

Performance

88

tokens/sec

Faster than 46% of models

1.77

seconds

Faster than 27% of models

24.59

seconds

Faster than 20% of models

Market Median

95 tok/s

7% slower

Median TTFT

1.11s

60% slower

Throughput/Dollar

110

tok/s per $/1M

Speed Comparison

Mistral 7B Instruct
87 tok/s-0%
Llama 3.3 Instruct 70B
88 tok/s+1%
GPT-5 (low)
89 tok/s+1%

Context Window

262K

tokens

Larger than 62% of models

Max Output

66K

tokens

25% of context

Benchmarks

MMLU-ProNot evaluated
GPQA Diamond
82.8%
HLE
19.9%
LiveCodeBenchNot evaluated
SciCode
36.7%
TerminalBench Hard
34.8%
MATH-500Not evaluated
AIMENot evaluated
AIME 2025Not evaluated
IFBench
53.5%
Long Context Recall
66.7%
Tau2
91.2%
Market AverageTop Score

Quick Compare

Similar Models

Compare all 7 models