Skip to main content
Back to Blog

LLM pricing briefing: March 23, 2026

Weekly LLM pricing update covering GLM 5 at $1.11/M, MiniMax M2.7 at $0.52/M, GPT-5.4 Mini, and open-weight momentum.

FindLLMMarch 23, 2026
pricingweekly-briefingopen-sourcebenchmarks

The cheapest models worth using keep getting cheaper. This week, the sub-$2/M tier gained real quality contenders, and the premium tier is starting to look hard to justify for most workloads.

The value tier is no longer a compromise

MiniMax M2.7 (MiniMax) sits at $0.52/M input tokens with a 49.6 quality index. MiniMax M2.7 scores within striking distance of GPT-5.2 (51.3 quality) at roughly one-ninth the price. The 43 tok/s inference speed is the trade-off — batch-heavy workloads won't mind, but interactive use will feel sluggish.

Z.ai: GLM 5 (Z AI) lands at $1.11/M with 49.8 quality and 89 tok/s. GLM 5 is open source, which matters for self-hosting economics. At that quality-to-price ratio, it's the strongest open-source option on the board right now.

OpenAI: GPT-5.4 Mini (OpenAI) at $1.69/M delivers 48.1 quality and a blistering 237 tok/s. GPT-5.4 Mini is the speed leader by a wide margin — nearly double the next fastest model in this set. For latency-sensitive applications where you need fast iteration, nothing else comes close.

ModelQualityPrice/1MSpeed
MiniMax M2.749.6$0.5243 tok/s
GLM 549.8$1.1189 tok/s
GPT-5.4 Mini48.1$1.69237 tok/s
GPT-5.457.2$5.6385 tok/s
Claude Opus 4.653.0$10.0051 tok/s

Price comparison

Premium models: tied at the top, split on price

Gemini 3.1 Pro Preview (Google) and GPT-5.4 (OpenAI) share an identical 57.2 quality index. Google undercuts OpenAI by $1.13/M ($4.50 vs. $5.63) and runs faster at 117 tok/s versus 85 tok/s. For general-purpose premium work, Gemini 3.1 Pro is the better deal on both axes.

Anthropic's adaptive reasoning models remain the most expensive options at $10.00/M. Claude Opus 4.6 hits 53.0 quality — 4 points below the leaders. That's a tough sell at nearly double the price unless your workflow specifically benefits from Anthropic's reasoning approach.

Community signal: open weights accelerating

The biggest community story this week is MiniMax confirming M2.7 will go open-weight (555 upvotes on r/LocalLLaMA). Combined with Alibaba reaffirming ongoing open-source releases for Qwen and Wan, the self-hosting ecosystem is pulling in competitive models fast. Multiple threads show users running Qwen 3.5 variants on consumer hardware, including V100s hitting 115 tok/s on Qwen Coder 30B.

What to do with this

If you're running batch inference or retrieval pipelines, MiniMax M2.7 at $0.52/M is hard to beat. For interactive applications where latency matters, GPT-5.4 Mini at 237 tok/s is the pick. Premium users should benchmark Gemini 3.1 Pro against GPT-5.4 before defaulting to OpenAI — same quality, lower cost.

Find the right model for your workload constraints with the LLM Selector, or browse current pricing on Explore.

Stay in the loop

Weekly LLM analysis delivered to your inbox. No spam.