Gemini 3.1 Pro holds at $4.50 while GPT-5.5 variants fragment the top tier

Weekly LLM briefing: GPT-5.5 leads quality but three effort tiers muddy the picture. Gemini 3.1 Pro stays the value play. Kimi K2.6 undercuts everyone.

FindLLMMay 18, 2026

weekly-briefingllm-pricinggpt-5-5gemini-3-1-prokimi-k2-6

The three-tier GPT-5.5 problem

GPT-5.5 (OpenAI) still tops quality at 60.2, but the gap between its default, high, and medium effort tiers is now visible enough to matter. GPT-5.5 (high) scores 58.9, GPT-5.5 (medium) drops to 56.7, and all three cost the same $11.25/M tokens. That pricing structure makes no sense for production workloads. If you're running medium effort, you're paying a 2x premium over GPT-5.4 at $5.63/M for nearly identical quality (56.7 vs 56.8). Unless you need the default tier's full 60.2, GPT-5.4 is the rational choice in the OpenAI lineup.

Where the value actually is

The mid-table tells the real story this week. Three models cluster between 53 and 54 quality at radically different price points:

Model	Quality	Price/M tokens	Speed	Open source
Gemini 3.1 Pro Preview	57.2	$4.50	133 tok/s	No
Kimi K2.6	53.9	$1.42	44 tok/s	Yes
Grok 4.3	53.2	$1.56	116 tok/s	No
MiMo-V2.5-Pro	53.8	$1.50	57 tok/s	No

Kimi K2.6 (MoonshotAI) at $1.42/M is the cheapest model in the top 15 and the only open-source option with quality above 53. For batch inference where inference latency doesn't dominate, it's hard to argue against. The 44 tok/s throughput is the trade-off: tight iteration loops will feel slow.

Grok 4.3 (xAI) offers a different profile. At 116 tok/s and $1.56/M, it's the fastest sub-$2 model by a wide margin. If your pipeline is latency-sensitive and cost-constrained, Grok 4.3 is the pick over Kimi despite the 0.7-point quality gap.

Price comparison

Gemini 3.1 Pro remains the awkward middle

Gemini 3.1 Pro Preview at 57.2 quality and 133 tok/s continues to occupy a strange position: it's 0.4 points behind Claude Opus 4.7 at less than half the price, and it's the fastest model in the entire top tier. For workloads where throughput matters more than squeezing out the last quality point, Gemini 3.1 Pro is the best option under $5/M tokens. I covered this in detail last week, but the math hasn't changed because nothing else has moved.

Speed comparison

Anthropic's pricing looks increasingly isolated

Anthropic now has five models in the top 15, priced between $6.56 and $10.94/M. None of them crack 58 quality. Claude Opus 4.7 at 57.3 and $10.00/M is 0.1 points above Gemini 3.1 Pro at $4.50/M. The adaptive reasoning variants add cost without clear quality gains in this snapshot. Unless you have Anthropic-specific tooling dependencies, the value proposition is thin.

What to watch

Qwen3.6 Max at 51.8 quality and $2.92/M is still in preview. If Alibaba pushes quality above 53 at that price, the sub-$3 tier gets genuinely competitive.
Muse Spark (Meta) has no pricing or speed data yet. An open-source model at 52.2 quality from Meta could shift self-hosting economics.
Whether OpenAI rationalizes GPT-5.5 tier pricing. Charging $11.25 for medium effort when GPT-5.4 exists at $5.63 is a temporary situation.

Find the right model for your workload with the LLM Selector or browse the full rankings on Explore.

Stay in the loop

Reviewed LLM analysis when a new edition is ready. No spam.