The top three are now separated by half a point

Claude Opus 4.7 (Anthropic) leads the quality index at 57.3, with Gemini 3.1 Pro Preview (Google) at 57.2 and GPT-5.4 (OpenAI) at 56.8. That's a 0.5-point spread across three different providers. The practical difference at this tier is negligible for most workloads; the real differentiators are price and throughput.

Where the money argument gets interesting

Model	Quality	Price/1M input	Speed
Claude Opus 4.7	57.3	$10.00	53 tok/s
Gemini 3.1 Pro Preview	57.2	$4.50	134 tok/s
GPT-5.4	56.8	$5.63	86 tok/s
Grok 4.20	49.3	$3.00	222 tok/s

Gemini 3.1 Pro Preview delivers 99.8% of Claude Opus 4.7's quality at 45% of the cost and 2.5x the inference speed. For batch processing, RAG pipelines, or any workload where you're paying per token at scale, that's the clear pick. Claude Opus 4.7 costs $10/M tokens and runs at 53 tok/s. I struggle to justify that premium when the quality gap is 0.1 points.

GPT-5.4 sits in the middle on every axis. Not the cheapest, not the fastest, not the highest quality. It's a reasonable default if you're already on OpenAI infrastructure, but it's no longer the obvious choice for anything specific.

Grok 4.20: fastest model in the index by a wide margin

Grok 4.20 from xAI hits 222 tok/s at $3.00/M tokens. That's 65% faster than the next-closest model (Gemini 3.1 Pro Preview at 134 tok/s). The quality score of 49.3 puts it below the frontier tier, but for latency-sensitive applications like interactive agents, autocomplete, or any pipeline where iteration speed matters more than peak accuracy, nothing else comes close.

The budget tier keeps compressing

MiniMax M2.7 scores 49.6 quality at $0.52/M tokens. That's 86.7% of GPT-5.4's quality for under a tenth of the price. Qwen3.6 Plus at $0.73/M and at $1.11/M round out a sub-$1.50 tier where three open-weight or budget models cluster between 49.8 and 50.0 quality. For classification, summarization, and structured extraction at high volume, this tier now handles what required frontier models a year ago.

Claude Opus 4.7 takes the top spot by a hair, Grok 4.20 rewrites the speed equation

The top three are now separated by half a point

Where the money argument gets interesting

Grok 4.20: fastest model in the index by a wide margin

The budget tier keeps compressing

Stay in the loop

GPT-5.3-Codex holds the code-specific slot

What to watch