Kimi K2.6 and Grok 4.3 undercut the field below $1.60 while GPT-5.5 stays expensive at the top

Weekly LLM briefing: Kimi K2.6 hits 53.9 quality at $1.42/M tokens, Grok 4.3 delivers 133 tok/s at $1.56, and the budget tier closes in on mid-range models.

FindLLMMay 25, 2026

weekly-briefingkimi-k2-6grok-4-3qwen3-7-maxbudget-llms

The sub-$2 tier now scores within 7 points of the best model available

Two models priced under $1.60 per million tokens now score above 53 on quality index. Kimi K2.6 (MoonshotAI) posts 53.9 at $1.42/M, and Grok 4.3 (xAI) hits 53.2 at $1.56/M with 133 tokens per second. That puts them 6-7 points behind GPT-5.5 (OpenAI), which leads at 60.2 but costs $11.25/M. The quality gap is narrowing faster than the price gap is closing.

What moved this week

The big story isn't a single launch. It's the shape of the market crystallizing into three distinct price bands with diminishing quality returns at the top.

Premium ($10+/M): GPT-5.5 holds the quality crown at 60.2 but at 66 tok/s, it's the slowest OpenAI model in the table. Claude Opus 4.7 scores 57.3 at $10.00/M with even slower throughput at 49 tok/s. You're paying 7x more than the mid-tier for roughly 5-6 points of quality.

Mid-range ($3-6/M): This is where value concentrates. Gemini 3.1 Pro Preview delivers 57.2 quality at $4.50/M and 125 tok/s. Qwen3.7 Max (Alibaba) is open source, scores 56.6 at $3.75/M, and pushes 198 tok/s. Gemini 3.5 Flash trades 2 quality points for 210 tok/s at $3.38/M.

Budget (<$2/M): Kimi K2.6 and Grok 4.3 now compete with models that cost 3-4x more. Kimi is open source, which matters for self-hosting.

Model	Quality	Price/M	Speed	Open source
GPT-5.5	60.2	$11.25	66 tok/s	No
Gemini 3.1 Pro	57.2	$4.50	125 tok/s	No
Qwen3.7 Max	56.6	$3.75	198 tok/s	Yes
Kimi K2.6	53.9	$1.42	103 tok/s	Yes
Grok 4.3	53.2	$1.56	133 tok/s	No

Price comparison

Where this matters operationally

If your workload involves high-volume batch processing where retries are rare, Kimi K2.6 at $1.42/M cuts API costs by 87% versus GPT-5.5 with a quality drop you may not notice in summarization or extraction tasks. For latency-sensitive applications, Qwen3.7 Max at 198 tok/s delivers 3x the throughput of GPT-5.5 while scoring only 3.6 points lower.

The premium tier makes sense when those few quality points compound: multi-step reasoning chains, complex code generation, tasks where a single error triggers expensive downstream failures. For everything else, the mid-range and budget tiers are now hard to justify skipping.

Quality comparison

Who should care about Grok 4.3

At 133 tok/s and $1.56/M, Grok 4.3 is the fastest model under $2. If you're building interactive applications where inference latency directly affects user experience, it's worth benchmarking against Gemini 3.5 Flash (210 tok/s at $3.38/M). You get 64% of Flash's speed at 46% of the price.

What to watch

Qwen3.7 Max self-hosted performance. Open-source at 56.6 quality with 198 tok/s on API. Real-world self-hosted throughput on consumer hardware will determine whether it displaces Llama variants in local deployments.
GPT-5.5 pricing pressure. OpenAI holds the quality lead, but $11.25/M looks increasingly difficult to defend when Gemini 3.1 Pro sits 3 points below at 60% less cost.
Kimi's next move. MoonshotAI priced K2.6 aggressively and made it open source. If quality climbs past 55 in a point release, the mid-tier gets squeezed from below.

Find the right model for your workload and budget on the LLM Selector, or browse the full rankings on Explore.

Stay in the loop

Reviewed LLM analysis when a new edition is ready. No spam.