Skip to main content
Back to Blog

Qwen3.7 Max hits 56.6 quality at $1.88/M as mid-tier value war intensifies

Claude Opus 4.8 and GPT-5.5 anchor the top tier while Qwen, Gemini and GPT-5.4 reshape the $5/M segment.

FindLLMJune 15, 2026
weekly-briefingllm-marketvalue-comparison

The top tier is settled, the mid-tier is where the action is

Claude Fable 5 still owns the quality crown at 64.9, but at $20/M it's a specialized tool, not a default. Below it, the real fight is between Claude Opus 4.8 at 61.4 quality / $10/M and GPT-5.5 at 60.2 / $11.25. The 1.2-point quality gap costs 12.5% more per million tokens, and Opus 4.8 is 12 tok/s slower. For most production pipelines, that's a wash.

The interesting story sits at quality 55–57, where five models compete on radically different trade-offs.

The $5/M segment is now crowded

ModelQualityPriceSpeed
Gemini 3.1 Pro Preview57.2$4.50142 tok/s
GPT-5.456.8$5.63203 tok/s
Qwen3.7 Max56.6$1.88199 tok/s
Gemini 3.5 Flash55.3$3.38227 tok/s

Qwen3.7 Max is the standout. At $1.88/M, it's 67% cheaper than Gemini 3.1 Pro Preview for 0.6 points of quality and 40% more throughput. It's also the only open-source model in this tier, deployable on your own infra if the per-token price still feels too high.

Quality comparison

Speed vs cost: pick your poison

If latency matters, Gemini 3.5 Flash at 227 tok/s is the fastest production model in the dataset. At $3.38/M, it undercuts GPT-5.4 by 40% while running 12% faster, though you give up 1.5 quality points.

For batch jobs where cost dominates, MiniMax M3 at $0.52/M and MiMo-V2.5-Pro at $0.54/M both clear 53 quality. That's not frontier, but it's enough for classification, extraction, and routing layers where the top tier would be overkill.

Output speed

Open source is now a real option at 56+ quality

Kimi K2.6 and Qwen3.7 Max are the only open-weight models above 53 quality. Qwen3.7 Max's 56.6 puts it within 4.8 points of Opus 4.8, a gap that closes further when you factor in self-hosting economics. For teams already running GPU infra, inference cost drops to electricity and amortization.

What to watch

  • Whether OpenAI responds to Qwen's pricing. GPT-5.5 (medium) at 56.7 quality for $11.25 looks stranded between GPT-5.4 and the full GPT-5.5.
  • Gemini 3.1 Pro Preview leaving preview and locking in its $4.50 price.
  • Any open-source release above 58 quality. That would compress the top tier's pricing power.

For most workloads this week, Qwen3.7 Max is the pick. For latency-critical pipelines, Gemini 3.5 Flash. For top-tier quality without the Fable 5 premium, Claude Opus 4.8.

Use the LLM Selector to filter by your actual throughput and budget constraints.

Stay in the loop

Weekly LLM analysis delivered to your inbox. No spam.