GPT-5.5 opens a 3-point quality gap, Gemini 3.1 Pro undercuts everyone above it
GPT-5.5 leads at 60.2 quality but costs $11.25/M tokens. Gemini 3.1 Pro matches Opus 4.7 at under half the price. Weekly LLM briefing for April 27.
GPT-5.5 (OpenAI) now sits at 60.2 on the quality index, a full 2.9 points above its nearest competitor. That's the widest gap at the top of the leaderboard in weeks. But the more consequential move is one tier down, where Gemini 3.1 Pro Preview (Google) matches Claude Opus 4.7's quality at less than half the cost.
The top tier costs $10+ per million tokens. Is it worth it?
Three models cluster above 57: GPT-5.5 at 60.2, Claude Opus 4.7 at 57.3, and Gemini 3.1 Pro at 57.2. Only one of them charges less than $5/M tokens.
| Model | Quality | Price/M input | Speed |
|---|---|---|---|
| GPT-5.5 | 60.2 | $11.25 | 84 tok/s |
| Claude Opus 4.7 | 57.3 | $10.00 | 59 tok/s |
| Gemini 3.1 Pro Preview | 57.2 | $4.50 | 132 tok/s |
| GPT-5.4 | 56.8 | $5.63 | 85 tok/s |
Gemini 3.1 Pro trails Opus 4.7 by 0.1 quality points, costs 55% less, and generates tokens at 132 tok/s — more than double Opus's 59 tok/s. For any workload where inference latency matters and you're running thousands of requests, the math is straightforward. Gemini 3.1 Pro is the best model under $5/M tokens for general-purpose work, and it isn't close.
GPT-5.5's quality lead is real but expensive. The 2.9-point gap over Gemini justifies the premium only when you need peak accuracy on hard tasks and cost is secondary. For batch processing or high-volume pipelines, you're paying 2.5x more per token for a ~5% quality improvement.
The budget tier got more interesting
DeepSeek V4 Pro at $0.54/M tokens and 51.5 quality remains the cheapest way to get above 50 on the index. Kimi K2.6 at $1.72/M offers 53.9 quality with 138 tok/s throughput, the fastest in the entire table. MiMo-V2.5-Pro (Xiaomi) slots in between at $1.50/M and 53.8 quality but only 66 tok/s.
If you're optimizing for cost per quality point, DeepSeek V4 Pro wins at roughly $0.01 per quality point per million tokens. Kimi K2.6 is the pick when you need both speed and quality under $2/M — its 138 tok/s throughput means faster iteration loops and lower wall-clock time on sequential chains.
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.