Claude Opus 4.8 takes the quality lead as Gemini 3.1 Pro undercuts it by 55%
Claude Opus 4.8 tops quality at 61.4 but costs $10/M. Gemini 3.1 Pro hits 57.2 at $4.50. Here's where the price-performance line actually sits this week.
Claude Opus 4.8 (Anthropic) leads quality this week at 61.4, edging past GPT-5.5 (OpenAI) at 60.2. Both sit above $10/M. The more interesting number: Gemini 3.1 Pro Preview (Google) lands at 57.2 quality for $4.50/M — within 4.2 points of the top while costing 55% less.
The top tier is getting expensive to defend
Anthropic's lead is real but thin. Opus 4.8 scores 61.4 against GPT-5.5's 60.2, a 1.2-point gap that won't show up in most production workloads. Both run slow for frontier models: 65 and 66 tok/s respectively.
For anything latency-sensitive, neither is the right pick. The premium you pay at $10–11.25/M buys you the last couple of quality points and not much else.
Where the value actually is
Gemini 3.1 Pro is the model I'd reach for as a default this week. At 57.2 quality, $4.50/M, and 148 tok/s, it beats the top tier on throughput by more than 2x while giving up almost nothing on quality.
| Model | Quality | Price/1M | Speed |
|---|---|---|---|
| Claude Opus 4.8 | 61.4 | $10.00 | 65 tok/s |
| GPT-5.5 | 60.2 | $11.25 | 66 tok/s |
| Gemini 3.1 Pro | 57.2 | $4.50 | 148 tok/s |
| Qwen3.7 Max | 56.6 | $1.88 | 190 tok/s |
The faster iteration loop matters for agentic pipelines where you chain dozens of calls. At 148 tok/s versus 65, Gemini cuts wall-clock time on multi-step chains roughly in half.
The cheap end keeps closing the gap
Qwen3.7 Max (Alibaba) is the open-weight story worth tracking: 56.6 quality at $1.88/M and 190 tok/s. That's 0.6 points below Gemini Pro at less than half the price, with higher throughput.
For batch jobs where retries dominate cost, Qwen3.7 Max changes the math. You can absorb more failed-and-retried calls before the bill matches a single Gemini Pro run.
MiMo-V2.5-Pro (Xiaomi) is the budget outlier at $0.54/M and 53.8 quality. The catch is speed — 53 tok/s makes it a poor fit for interactive use, but fine for overnight batch where latency is irrelevant.
GPT-5.5's reasoning tiers don't help
OpenAI ships GPT-5.5 in high/medium variants, all at $11.25/M. The plain model (60.2) outscores high (58.9) and medium (56.7). Paying frontier price for the medium tier at 56.7 quality makes no sense when Gemini Pro hits 57.2 for 60% less.
What to watch
- Whether Gemini 3.1 Pro stays in "Preview" or gets a price bump on general availability — the $4.50 number is the whole argument.
- Qwen3.7 Max adoption in production agentic stacks now that it's within a point of Gemini Pro on quality.
- Whether Anthropic widens the quality lead beyond 1.2 points, or whether the top tier stays a coin-flip between Opus and GPT-5.5.
Need to match a model to a specific workload? Start with the LLM Selector or browse the full field on Explore.
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.