Insights
Deep dives and practical guides on LLM performance, pricing changes, and new model comparisons.(31 posts)
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.
GPT-5.5's reasoning tiers cost the same $11.25/M tokens but deliver very different results
GPT-5.5 high and medium reasoning modes share a price tag but diverge by 2.2 quality points. When does the gap matter?
GPT-5.5 opens a 3-point quality gap, Gemini 3.1 Pro undercuts everyone above it
GPT-5.5 leads at 60.2 quality but costs $11.25/M tokens. Gemini 3.1 Pro matches Opus 4.7 at under half the price. Weekly LLM briefing for April 27.
Which LLM should you pick for coding and software development in April 2026?
Practical guide to choosing the best LLM for coding workloads in April 2026, with benchmarks, pricing, and decision tables.
Kimi K2.6 scores 53.9 at $1.48/M tokens — and that changes who you should call for mid-tier workloads
Kimi K2.6 from MoonshotAI delivers near-GPT-5.3-Codex quality at less than a third the price. We break down when it wins and when it doesn't.
Claude Opus 4.7 takes the top spot by a hair, Grok 4.20 rewrites the speed equation
Claude Opus 4.7 edges out Gemini 3.1 Pro Preview on quality while Grok 4.20 hits 222 tok/s. Weekly LLM market briefing for April 20, 2026.
When the model changes and nobody tells you: the transparency crisis in frontier AI
Claude Code issue #42796 reveals a deeper problem: frontier AI vendors change model behavior without meaningful disclosure, and users default to cynicism.
Claude Mythos Preview is not a product launch — it's a new access tier for frontier AI
Anthropic's Claude Mythos Preview signals that the strongest coding and cyber models are becoming gated infrastructure, not public products.
Claude Code source leak: what the 512,000 lines actually reveal about Anthropic's agent architecture
Deep analysis of the Claude Code source map leak on March 31, 2026 — what was exposed, what wasn't, and what it means for the coding agent market.
Why agent builders on OpenRouter converge on the same small set of models
Analysis of which models power top AI agent apps on OpenRouter, why each fills a different role, and how to pick a stack by workload.
The five roles inside real agent stacks in 2026
Practitioners aren't picking one model for agents. They're routing across five roles. Here's which models fill each slot and why.
Best LLMs of March 2026: Quality, Speed, and Price Comparison
Top LLMs by quality score, inference speed, and pricing. GPT-5.4 and Gemini 3.1 Pro lead at 57.2 quality, but value varies by workload.
The best cost-per-quality ratio in LLMs right now (March 2026)
Comparing cost-per-quality across top LLMs. MiniMax M2.7 leads at $0.52/M tokens with 49.6 quality, but the full picture is more nuanced.