Gemini 3.1 Pro matches Claude Opus 4.7 quality at less than half the price
Google's Gemini 3.1 Pro Preview closes to within 0.1 points of Claude Opus 4.7 while costing $4.50 vs $10.00/M tokens. Plus: Grok 4.3 emerges as the speed-value pick.
The price-quality gap is collapsing in the mid-tier
Gemini 3.1 Pro Preview (Google) now scores 57.2 on quality index — just 0.1 points behind Claude Opus 4.7 (Anthropic) at 57.3. The cost difference: $4.50 vs $10.00 per million tokens. At 131 tok/s versus 59 tok/s, Gemini also delivers inference more than twice as fast. For any workload where you're not squeezing the last fraction of quality, the economic case for Claude Opus at this tier has eroded significantly.
The leaderboard this week
| Model | Quality | Price/M | Speed | Best for |
|---|---|---|---|---|
| GPT-5.5 | 60.2 | $11.25 | 75 tok/s | Peak quality, cost-insensitive |
| Claude Opus 4.7 | 57.3 | $10.00 | 59 tok/s | Anthropic ecosystem lock-in |
| Gemini 3.1 Pro Preview | 57.2 | $4.50 | 131 tok/s | Throughput-heavy production |
| Grok 4.3 | 53.2 | $1.56 | 112 tok/s | High-volume, cost-constrained |
Grok 4.3 is the quiet mover
Grok 4.3 (xAI) sits at 53.2 quality, $1.56/M tokens, and 112 tok/s. That's faster than GPT-5.5 and cheaper than nearly everything above it. The quality gap to Gemini 3.1 Pro is 4 points, which matters for complex reasoning chains. But for classification, extraction, and structured output tasks where 53+ quality suffices, Grok delivers the best tokens-per-dollar at high throughput. Fewer retries at 112 tok/s means tighter iteration loops in agent architectures.
The open-source bracket
Kimi K2.6 (MoonshotAI) leads open-weight models at 53.9 quality and $1.43/M tokens, though inference latency at 31 tok/s limits its use in real-time applications. Qwen3.6 Max Preview (Alibaba) offers 51.8 quality at $2.92/M — nearly double Kimi's price for lower quality. For self-hosted deployments where you control the inference stack, Kimi K2.6 is the clear pick if you can tolerate the throughput constraint.
GPT-5.5 reasoning tiers: diminishing returns confirmed
The spread between GPT-5.5 (60.2), GPT-5.5 high (58.9), and GPT-5.5 medium (56.7) all at $11.25/M tokens makes the default tier the only rational choice unless you need deterministic latency from the medium tier. I covered this in depth last week, but it bears repeating: same price, 3.5 quality points difference. Always use default.
What to watch
Gemini 3.1 Pro leaving preview. If Google holds the $4.50 price at GA, it becomes the default recommendation for production workloads in the 55-58 quality range.
Grok's next move. At $1.56/M and 112 tok/s, xAI is positioned to capture batch processing workloads. Any quality bump above 55 makes it a serious threat to Gemini's value proposition.
Meta's Muse Spark. Listed at 52.1 quality with no pricing or speed data yet. If Meta prices this aggressively as an open model, the sub-$1/M tier could get interesting fast.
Find the right model for your workload constraints with the LLM Selector.
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.