GPT-5.4 is the model OpenAI doesn't want you to notice
GPT-5.4 scores 56.8 quality at $5.63/M tokens and 90 tok/s, quietly outperforming its pricier siblings on value.
FindLLMMay 27, 2026
gpt-5-4openaivalue-analysismodel-comparison
The overlooked middle child
OpenAI's lineup has a pricing problem, and GPT-5.4 (gpt-5-4) is the evidence. At 56.8 quality and $5.63/M tokens, it delivers 95% of GPT-5.5's quality at half the cost. That ratio matters more than any headline benchmark, because most production workloads are cost-constrained, not quality-constrained.
OpenAI currently sells five models in the leaderboard's top 15. The marketing push is clearly behind GPT-5.5 and its variants. But when you line up the numbers, GPT-5.4 occupies the most defensible position in the entire OpenAI portfolio for teams that need strong general quality without burning budget.
The math that makes GPT-5.4 interesting
Here's the core comparison across OpenAI's own family:
GPT-5.4 delivers nearly double the quality-per-dollar of any GPT-5.5 variant. It also runs at 90 tokens per second, which is 25% faster than GPT-5.5's 72 tok/s and 36% faster than GPT-5.5 (high). For interactive applications where inference latency shapes user experience, that gap is real.
The GPT-5.5 (medium) variant is particularly damning here. It scores 56.7, essentially identical to GPT-5.4's 56.8, but costs exactly twice as much and runs 24% slower. I genuinely cannot construct a workload where GPT-5.5 (medium) is the right choice over GPT-5.4.
How does it stack up outside the OpenAI family?
The competitive picture gets more nuanced when you look beyond OpenAI. GPT-5.4 sits in a contested middle tier where several strong models compete.
Gemini 3.1 Pro Preview scores 57.2 quality at $4.50/M tokens and 125 tok/s. That's slightly higher quality, lower price, and significantly faster throughput. On paper, Gemini 3.1 Pro dominates GPT-5.4 on every axis. The caveat: it's still a preview model, which introduces uncertainty about rate limits, availability guarantees, and whether the final release will match these numbers.
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.
Qwen3.7 Max from Alibaba scores 56.6 at $1.88/M tokens and 205 tok/s. Nearly identical quality to GPT-5.4 at one-third the price and more than double the throughput. For batch processing, summarization pipelines, or any workload where you're burning through millions of tokens daily, Qwen3.7 Max is the obvious volume play.
So why would anyone pick GPT-5.4? Two reasons. First, OpenAI's API ecosystem is the most mature in the industry. Function calling, structured outputs, fine-tuning infrastructure, and tool use all work with fewer edge cases. If your stack is already built on OpenAI, switching providers to save $1.13/M tokens introduces migration risk that may not justify the savings. Second, Qwen3.7 Max runs through Alibaba's infrastructure, which introduces latency and compliance considerations for teams operating under US or EU data residency requirements.
Where GPT-5.4 falls short
I won't pretend this model is universally compelling. The 3.4-point quality gap between GPT-5.4 (56.8) and GPT-5.5 (60.2) is not trivial. For complex reasoning chains, multi-step agentic workflows, or tasks where a few percentage points of accuracy compound across dozens of steps, that gap translates to meaningfully higher failure rates. If you're building an agent that chains 10+ LLM calls, each small quality deficit multiplies.
The speed advantage over GPT-5.5 also matters less than it might seem. At 90 tok/s, GPT-5.4 is fast enough for most interactive use cases, but it's nowhere near the throughput leaders. Gemini 3.5 Flash pushes 230 tok/s. Qwen3.7 Max hits 205 tok/s. If your bottleneck is generation throughput rather than quality, GPT-5.4 isn't the answer.
The real argument for GPT-5.4
The case for this model is specific: you need quality above 55, you're already on OpenAI's platform, and you're running workloads where cost scales with volume. Think classification pipelines, content generation at scale, customer-facing chat where quality needs to be good but doesn't need to be the absolute best. At $5.63/M tokens, you can run roughly twice the volume of GPT-5.5 for the same budget with negligible quality loss.
For teams currently running GPT-5.5 (medium) at $11.25/M, the switch is a no-brainer. Same quality, half the cost, faster inference. That's not a trade-off. That's a billing error.
For everyone else, the decision is harder. Gemini 3.1 Pro is cheaper and faster with marginally better quality. Qwen3.7 Max is dramatically cheaper. GPT-5.4's advantage is ecosystem lock-in, which is a real factor but not one that shows up in benchmark tables.
Who should use this model
If you're spending more than $10K/month on OpenAI API calls and running GPT-5.5 variants for tasks that don't strictly require 60+ quality, GPT-5.4 likely cuts your bill in half. Run a quality evaluation on your specific workload, compare outputs at both tiers, and let the data decide.
If you're not locked into OpenAI, use the LLM Selector to filter by your actual constraints. The 56–57 quality band is crowded, and the right choice depends on your latency requirements, data residency needs, and whether you value open-source weights. Browse the full rankings on Explore to see where GPT-5.4 fits relative to your current model.