Claude Opus 4.7 takes the top spot by a hair, Grok 4.20 rewrites the speed equation
Claude Opus 4.7 edges out Gemini 3.1 Pro Preview on quality while Grok 4.20 hits 222 tok/s. Weekly LLM market briefing for April 20, 2026.
The top three are now separated by half a point
Claude Opus 4.7 (Anthropic) leads the quality index at 57.3, with Gemini 3.1 Pro Preview (Google) at 57.2 and GPT-5.4 (OpenAI) at 56.8. That's a 0.5-point spread across three different providers. The practical difference at this tier is negligible for most workloads; the real differentiators are price and throughput.
Where the money argument gets interesting
| Model | Quality | Price/1M input | Speed |
|---|---|---|---|
| Claude Opus 4.7 | 57.3 | $10.00 | 53 tok/s |
| Gemini 3.1 Pro Preview | 57.2 | $4.50 | 134 tok/s |
| GPT-5.4 | 56.8 | $5.63 | 86 tok/s |
| Grok 4.20 | 49.3 | $3.00 | 222 tok/s |
Gemini 3.1 Pro Preview delivers 99.8% of Claude Opus 4.7's quality at 45% of the cost and 2.5x the inference speed. For batch processing, RAG pipelines, or any workload where you're paying per token at scale, that's the clear pick. Claude Opus 4.7 costs $10/M tokens and runs at 53 tok/s. I struggle to justify that premium when the quality gap is 0.1 points.
GPT-5.4 sits in the middle on every axis. Not the cheapest, not the fastest, not the highest quality. It's a reasonable default if you're already on OpenAI infrastructure, but it's no longer the obvious choice for anything specific.
Grok 4.20: fastest model in the index by a wide margin
Grok 4.20 from xAI hits 222 tok/s at $3.00/M tokens. That's 65% faster than the next-closest model (Gemini 3.1 Pro Preview at 134 tok/s). The quality score of 49.3 puts it below the frontier tier, but for latency-sensitive applications like interactive agents, autocomplete, or any pipeline where iteration speed matters more than peak accuracy, nothing else comes close.
The budget tier keeps compressing
MiniMax M2.7 scores 49.6 quality at $0.52/M tokens. That's 86.7% of GPT-5.4's quality for under a tenth of the price. Qwen3.6 Plus at $0.73/M and GLM 5 at $1.11/M round out a sub-$1.50 tier where three open-weight or budget models cluster between 49.8 and 50.0 quality. For classification, summarization, and structured extraction at high volume, this tier now handles what required frontier models a year ago.
GPT-5.3-Codex holds the code-specific slot
GPT-5.3-Codex at 53.6 quality and $4.81/M tokens remains OpenAI's dedicated coding model. It scores below the general frontier trio but above the budget tier. If your pipeline is code-heavy and you need the specialized fine-tuning, it's positioned between GPT-5.2 (51.3) and GPT-5.4 (56.8) on general quality, at a lower price point than 5.4.
What to watch
- Gemini 3.1 Pro is still labeled "Preview." If Google holds the $4.50 price at GA, it becomes the default recommendation for cost-conscious teams running frontier-quality workloads.
- Muse Spark (Meta) posted 52.1 quality with no pricing or speed data yet. An open-source model at that quality level with competitive inference cost would reshape the budget tier.
- Anthropic's pricing is increasingly hard to defend. Three models at $10/M tokens, and only Opus 4.7 cracks the top spot by 0.1 points.
Find the right model for your workload on the LLM Selector or browse the full rankings on Explore.
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.