Claude Opus 4.7 takes the top spot by a hair, Grok 4.20 rewrites the speed equation
Claude Opus 4.7 edges out Gemini 3.1 Pro Preview on quality while Grok 4.20 hits 222 tok/s. Weekly LLM market briefing for April 20, 2026.
The top three are now separated by half a point
Claude Opus 4.7 (Anthropic) leads the quality index at 57.3, with Gemini 3.1 Pro Preview (Google) at 57.2 and GPT-5.4 (OpenAI) at 56.8. That's a 0.5-point spread across three different providers. The practical difference at this tier is negligible for most workloads; the real differentiators are price and throughput.
Where the money argument gets interesting
| Model | Quality | Price/1M input | Speed |
|---|---|---|---|
| Claude Opus 4.7 | 57.3 | $10.00 | 53 tok/s |
| Gemini 3.1 Pro Preview | 57.2 | $4.50 | 134 tok/s |
| GPT-5.4 | 56.8 | $5.63 | 86 tok/s |
| Grok 4.20 | 49.3 | $3.00 | 222 tok/s |
Gemini 3.1 Pro Preview delivers 99.8% of Claude Opus 4.7's quality at 45% of the cost and 2.5x the inference speed. For batch processing, RAG pipelines, or any workload where you're paying per token at scale, that's the clear pick. Claude Opus 4.7 costs $10/M tokens and runs at 53 tok/s. I struggle to justify that premium when the quality gap is 0.1 points.
GPT-5.4 sits in the middle on every axis. Not the cheapest, not the fastest, not the highest quality. It's a reasonable default if you're already on OpenAI infrastructure, but it's no longer the obvious choice for anything specific.
Grok 4.20: fastest model in the index by a wide margin
Grok 4.20 from xAI hits 222 tok/s at $3.00/M tokens. That's 65% faster than the next-closest model (Gemini 3.1 Pro Preview at 134 tok/s). The quality score of 49.3 puts it below the frontier tier, but for latency-sensitive applications like interactive agents, autocomplete, or any pipeline where iteration speed matters more than peak accuracy, nothing else comes close.
The budget tier keeps compressing
MiniMax M2.7 scores 49.6 quality at $0.52/M tokens. That's 86.7% of GPT-5.4's quality for under a tenth of the price. Qwen3.6 Plus at $0.73/M and at $1.11/M round out a sub-$1.50 tier where three open-weight or budget models cluster between 49.8 and 50.0 quality. For classification, summarization, and structured extraction at high volume, this tier now handles what required frontier models a year ago.
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.