Skip to main content
Back to Blog

Gemini 3.1 Pro matches Claude Opus 4.7 quality at less than half the price

Google's Gemini 3.1 Pro Preview closes to within 0.1 points of Claude Opus 4.7 while costing $4.50 vs $10.00/M tokens. Plus: Grok 4.3 emerges as the speed-value pick.

FindLLMMay 4, 2026
weekly-briefinggemini-3-1-proclaude-opus-4-7grok-4-3pricing

The price-quality gap is collapsing in the mid-tier

Gemini 3.1 Pro Preview (Google) now scores 57.2 on quality index — just 0.1 points behind Claude Opus 4.7 (Anthropic) at 57.3. The cost difference: $4.50 vs $10.00 per million tokens. At 131 tok/s versus 59 tok/s, Gemini also delivers inference more than twice as fast. For any workload where you're not squeezing the last fraction of quality, the economic case for Claude Opus at this tier has eroded significantly.

The leaderboard this week

ModelQualityPrice/MSpeedBest for
GPT-5.560.2$11.2575 tok/sPeak quality, cost-insensitive
Claude Opus 4.757.3$10.0059 tok/sAnthropic ecosystem lock-in
Gemini 3.1 Pro Preview57.2$4.50131 tok/sThroughput-heavy production
Grok 4.353.2$1.56112 tok/sHigh-volume, cost-constrained

Quality comparison

Grok 4.3 is the quiet mover

Grok 4.3 (xAI) sits at 53.2 quality, $1.56/M tokens, and 112 tok/s. That's faster than GPT-5.5 and cheaper than nearly everything above it. The quality gap to Gemini 3.1 Pro is 4 points, which matters for complex reasoning chains. But for classification, extraction, and structured output tasks where 53+ quality suffices, Grok delivers the best tokens-per-dollar at high throughput. Fewer retries at 112 tok/s means tighter iteration loops in agent architectures.

Price comparison

The open-source bracket

Kimi K2.6 (MoonshotAI) leads open-weight models at 53.9 quality and $1.43/M tokens, though inference latency at 31 tok/s limits its use in real-time applications. Qwen3.6 Max Preview (Alibaba) offers 51.8 quality at $2.92/M — nearly double Kimi's price for lower quality. For self-hosted deployments where you control the inference stack, Kimi K2.6 is the clear pick if you can tolerate the throughput constraint.

GPT-5.5 reasoning tiers: diminishing returns confirmed

The spread between GPT-5.5 (60.2), GPT-5.5 high (58.9), and GPT-5.5 medium (56.7) all at $11.25/M tokens makes the default tier the only rational choice unless you need deterministic latency from the medium tier. I covered this in depth last week, but it bears repeating: same price, 3.5 quality points difference. Always use default.

What to watch

Gemini 3.1 Pro leaving preview. If Google holds the $4.50 price at GA, it becomes the default recommendation for production workloads in the 55-58 quality range.

Grok's next move. At $1.56/M and 112 tok/s, xAI is positioned to capture batch processing workloads. Any quality bump above 55 makes it a serious threat to Gemini's value proposition.

Meta's Muse Spark. Listed at 52.1 quality with no pricing or speed data yet. If Meta prices this aggressively as an open model, the sub-$1/M tier could get interesting fast.

Find the right model for your workload constraints with the LLM Selector.

Stay in the loop

Weekly LLM analysis delivered to your inbox. No spam.