Skip to main content
Back to Blog

GPT-5.5 launches at 60.2 quality but Opus 4.8 keeps the crown

OpenAI's GPT-5.5 lands second on quality while costing 12% more than Claude Opus 4.8. Gemini 3.1 Pro still wins on price-per-quality.

FindLLMJune 8, 2026
gpt-5.5claude-opus-4.8gemini-3.1-prollm-pricingmodel-comparison

OpenAI shipped GPT-5.5 (OpenAI) this week at 60.2 quality and $11.25/1M tokens. It lands second on the quality index, 1.2 points behind Claude Opus 4.8 at 61.4, and costs 12.5% more per million tokens. The top of the leaderboard is now a two-horse race where you pay a premium for marginal gains.

The numbers that matter

ModelQualityPrice/1MSpeed
Claude Opus 4.861.4$10.0066 tok/s
GPT-5.560.2$11.2562 tok/s
Gemini 3.1 Pro Preview57.2$4.50136 tok/s
Qwen3.7 Max56.6$1.88102 tok/s

Quality comparison

The GPT-5.5 reasoning tiers tell their own story. The default config scores 60.2, but high drops to 58.9 and medium to 56.7. That inversion is unusual and worth flagging: spending more on reasoning effort here buys you lower benchmark quality, not higher. For most pipelines, the default tier is the one to ship.

Where the value actually sits

Opus 4.8 and GPT-5.5 trade roughly equal output speed (66 vs 62 tok/s) and near-equal quality. If you're already on Anthropic, GPT-5.5 gives you no reason to switch — you'd pay more for 1.2 fewer quality points.

The real story is below the frontier. Gemini 3.1 Pro Preview sits at 57.2 quality for $4.50/1M and runs at 136 tok/s. Against GPT-5.5 that's a 3-point quality gap for 60% off and more than double the throughput.

Price comparison

For code-heavy or agentic workloads where you fan out many calls, the frontier premium rarely survives the math. Three quality points won't offset doubling your token bill across a retry-heavy pipeline.

The open-weight bracket

Qwen3.7 Max (Alibaba) holds 56.6 quality at $1.88/1M and is open source. That's within 0.6 points of Gemini 3.1 Pro at less than half the price, with weights you can self-host. For batch jobs where you control the hardware, it's the cheapest path to near-frontier output.

MiniMax M3 at $0.52/1M and 54.7 quality remains the floor for cost-sensitive work, though its 44 tok/s throughput makes it a poor fit for interactive loops.

What to watch

  • Whether OpenAI corrects the GPT-5.5 reasoning-tier inversion or whether high is genuinely tuned for tasks the index doesn't capture.
  • Gemini 3.1 Pro is still labeled Preview. A GA release at the same $4.50 price would pressure both frontier models on cost-per-quality.
  • Qwen3.7 Max closing the last 0.6-point gap to Gemini would make the frontier tier hard to justify for anything but the most quality-critical calls.

Running the price-quality tradeoff yourself? Start with the LLM Selector or browse the full board on Explore.

Stay in the loop

Weekly LLM analysis delivered to your inbox. No spam.