GPT-5.5 takes the quality crown at $11.25, but the real action is at the bottom of the table
GPT-5.5 leads quality at 60.2 but costs 8x more than Kimi K2.6. This week's briefing breaks down who should care.
The big picture
GPT-5.5 (OpenAI) now sits at 60.2 quality index, the highest score on the board. It costs $11.25/M tokens and outputs at 69 tok/s. Whether that quality gap justifies the price depends entirely on your workload, because three models under $2/M tokens are clustering around 53-54 quality and closing fast.
GPT-5.5: quality leader, price outlier
OpenAI's new top model opens a 2.9-point quality gap over Claude Opus 4.7 (57.3) and a 3.0-point gap over Gemini 3.1 Pro Preview (57.2). That's meaningful. But the pricing tells a different story: $11.25/M tokens puts it at 2.5x Gemini 3.1 Pro's $4.50 and nearly 8x Kimi K2.6's $1.44.
The three GPT-5.5 variants are puzzling. The "high" effort mode scores lower (58.9) than the default (60.2), and "medium" drops to 56.7, all at the same $11.25 price. If you're paying premium, stick with the default configuration.
| Model | Quality | Price/M | Speed | Open source |
|---|---|---|---|---|
| GPT-5.5 | 60.2 | $11.25 | 69 tok/s | No |
| Claude Opus 4.7 | 57.3 | $10.00 | 61 tok/s | No |
| Gemini 3.1 Pro | 57.2 | $4.50 | 127 tok/s | No |
| Kimi K2.6 | 53.9 | $1.44 | 41 tok/s | Yes |
| Grok 4.3 | 53.2 | $1.56 | 91 tok/s | No |
The sub-$2 tier is getting crowded and competitive
Kimi K2.6 (MoonshotAI) at 53.9 quality for $1.44/M tokens, Grok 4.3 (xAI) at 53.2 for $1.56, and MiMo-V2.5-Pro (Xiaomi) at 53.8 for $1.50 are all within one quality point of each other. For batch processing, RAG pipelines, or any workload where retries dominate cost, these models deliver roughly 90% of GPT-5.4's quality at 25-27% of its price.
Grok 4.3 stands out for inference speed: 91 tok/s versus Kimi's 41 tok/s. If your pipeline is latency-sensitive, that's a 2.2x throughput advantage at essentially the same price and quality. Kimi's open-source licensing is the counterweight for teams needing self-hosted deployment.
Gemini 3.1 Pro remains the best mid-tier value
At 57.2 quality, $4.50/M tokens, and 127 tok/s, Gemini 3.1 Pro Preview is the fastest model in the top five and costs less than half of both GPT-5.5 and Claude Opus 4.7. For interactive applications where inference latency matters, nothing else at this quality level comes close on throughput. I covered this in detail last week, but the arrival of GPT-5.5 hasn't changed the calculus: Gemini still dominates quality-per-dollar in the mid tier.
What to watch
- GPT-5.5 effort modes: The inverted scoring (high < default) suggests these configurations aren't fully tuned. Expect adjustments or documentation clarifications from OpenAI.
- Kimi K2.6 adoption: An open-source model at 53.9 quality for $1.44/M is the cheapest path to self-hosting at this tier. Watch for fine-tuned variants.
- Qwen3.6 Max Preview: Alibaba's entry at 51.8 quality and $2.92/M is open-source but slow at 37 tok/s. If speed improves, it becomes a serious mid-tier contender.
Find the right model for your workload with the LLM Selector or browse the full rankings on Explore.
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.