Gemini 3.1 Pro Preview (Google) scores 57.2 on the quality index at $4.50/M tokens while generating 135 tokens per second. Claude Opus 4.7 (Anthropic) scores 57.3 at $10.00/M tokens and 66 tok/s. That 0.1-point quality gap is noise. The 55% cost reduction and 2x throughput advantage are not.
What does 0.1 quality points actually buy you?
Nothing measurable in production. At the top of the quality leaderboard, GPT-5.5 (OpenAI) sits alone at 60.2, a meaningful 3-point gap above everything else. But the cluster from 56.8 to 57.3—where Claude Opus 4.7, Gemini 3.1 Pro Preview, and GPT-5.4 all live—represents functionally equivalent output quality for most generative tasks. The question is never "which model is 0.1 points better?" It's "which model delivers equivalent quality at the lowest operational cost?"
And on that question, Gemini 3.1 Pro Preview wins decisively.
The throughput argument compounds
At 135 tok/s, Gemini 3.1 Pro Preview doesn't just cost less per token—it finishes generating faster. For interactive applications, that's the difference between a 2-second and a 4-second response on a 500-token completion. For batch pipelines processing thousands of requests, it's the difference between a 4-hour job and an 8-hour job.
Consider a workload generating 100M output tokens per month. With Claude Opus 4.7, that's $1,000/month in token costs alone. With Gemini 3.1 Pro Preview, it's $450. Over a year, the $6,600 saved is real infrastructure budget. And because Gemini completes requests in roughly half the wall-clock time, you need fewer concurrent connections to maintain the same throughput, reducing orchestration complexity.
Weekly LLM analysis delivered to your inbox. No spam.
When Claude Opus 4.7 still makes sense
I won't pretend this is entirely one-sided. "Preview" in Gemini 3.1 Pro's name signals instability risk—API behavior, rate limits, and availability guarantees may shift. If your production system requires SLA-backed uptime and you've already built tooling around Anthropic's API conventions, the switching cost is real.
There's also the question of specific task profiles. Quality index scores are composites. Claude Opus 4.7 may outperform on particular subtasks—long-form reasoning chains, nuanced instruction following, or specific coding patterns—where Gemini's aggregate score masks weaknesses. Without task-specific benchmarks, the 0.1-point composite gap tells us these models are peers on average, not that they're identical on every input.
GPT-5.5 occupies a different tier entirely
At 60.2 quality and $11.25/M tokens, GPT-5.5 is the only model that justifies premium pricing through a clear quality separation. Three points above the next cluster is a gap you can feel in output. But it's also 2.5x the cost of Gemini 3.1 Pro Preview for a 5% quality improvement, and it runs at 65 tok/s—less than half Gemini's throughput.
The calculus is straightforward: if your application demands peak quality and cost is secondary, GPT-5.5 is the pick. For everything else, the value proposition collapses rapidly as you move down the price curve.
The pricing tier below tells a similar story
GPT-5.4 at $5.63/M and 56.8 quality offers marginally worse output than Gemini 3.1 Pro Preview at a 25% price premium with slower throughput (86 vs. 135 tok/s). It's not a terrible model, but it occupies an awkward position: more expensive than Gemini with no quality advantage to show for it.
My recommendation
For new deployments where you're selecting a high-quality model and don't have existing vendor lock-in, Gemini 3.1 Pro Preview is the rational default. Same quality tier as Claude Opus 4.7, less than half the cost, double the inference speed. The "Preview" caveat is worth monitoring, but the operational advantages are too large to ignore based on naming conventions alone.
If you need the absolute highest quality available today and budget permits, GPT-5.5 remains alone at the top. Everything in between is a compromise that Gemini 3.1 Pro Preview renders unnecessary.
Explore the full comparison at LLM Selector or browse all models on Explore.