Claude Opus 4.8 takes the quality lead as Gemini 3.1 Pro undercuts it by 55%

Claude Opus 4.8 tops quality at 61.4 but costs $10/M. Gemini 3.1 Pro hits 57.2 at $4.50. Here's where the price-performance line actually sits this week.

FindLLMJune 1, 2026

claude-opus-4-8gemini-3-1-progpt-5-5weekly-briefing

Claude Opus 4.8 (Anthropic) leads quality this week at 61.4, edging past GPT-5.5 (OpenAI) at 60.2. Both sit above $10/M. The more interesting number: Gemini 3.1 Pro Preview (Google) lands at 57.2 quality for $4.50/M — within 4.2 points of the top while costing 55% less.

The top tier is getting expensive to defend

Anthropic's lead is real but thin. Opus 4.8 scores 61.4 against GPT-5.5's 60.2, a 1.2-point gap that won't show up in most production workloads. Both run slow for frontier models: 65 and 66 tok/s respectively.

For anything latency-sensitive, neither is the right pick. The premium you pay at $10–11.25/M buys you the last couple of quality points and not much else.

Quality comparison

Where the value actually is

Gemini 3.1 Pro is the model I'd reach for as a default this week. At 57.2 quality, $4.50/M, and 148 tok/s, it beats the top tier on throughput by more than 2x while giving up almost nothing on quality.

Model	Quality	Price/1M	Speed
Claude Opus 4.8	61.4	$10.00	65 tok/s
GPT-5.5	60.2	$11.25	66 tok/s
Gemini 3.1 Pro	57.2	$4.50	148 tok/s
Qwen3.7 Max	56.6	$1.88	190 tok/s

The faster iteration loop matters for agentic pipelines where you chain dozens of calls. At 148 tok/s versus 65, Gemini cuts wall-clock time on multi-step chains roughly in half.

The cheap end keeps closing the gap

Qwen3.7 Max (Alibaba) is the open-weight story worth tracking: 56.6 quality at $1.88/M and 190 tok/s. That's 0.6 points below Gemini Pro at less than half the price, with higher throughput.

For batch jobs where retries dominate cost, Qwen3.7 Max changes the math. You can absorb more failed-and-retried calls before the bill matches a single Gemini Pro run.

MiMo-V2.5-Pro (Xiaomi) is the budget outlier at $0.54/M and 53.8 quality. The catch is speed — 53 tok/s makes it a poor fit for interactive use, but fine for overnight batch where latency is irrelevant.

Price comparison

GPT-5.5's reasoning tiers don't help

OpenAI ships GPT-5.5 in high/medium variants, all at $11.25/M. The plain model (60.2) outscores high (58.9) and medium (56.7). Paying frontier price for the medium tier at 56.7 quality makes no sense when Gemini Pro hits 57.2 for 60% less.

What to watch

Whether Gemini 3.1 Pro stays in "Preview" or gets a price bump on general availability — the $4.50 number is the whole argument.
Qwen3.7 Max adoption in production agentic stacks now that it's within a point of Gemini Pro on quality.
Whether Anthropic widens the quality lead beyond 1.2 points, or whether the top tier stays a coin-flip between Opus and GPT-5.5.

Need to match a model to a specific workload? Start with the LLM Selector or browse the full field on Explore.

Stay in the loop

Reviewed LLM analysis when a new edition is ready. No spam.