Loading...
Loading...
Help & Support
How we compare AI models, where the data comes from, and what quality scores, benchmarks, and pricing metrics mean.
FindLLM is a free, independent aggregator for comparing Large Language Models (LLMs) by quality, speed, and price. It covers every major model family plus open-source rankings, AI agent analytics, task-specific leaderboards, and a cost calculator.
Data comes from public third-party sources and is surfaced in one place. See the About page for the full list of sources.
The Quality Index is a composite score (0–100) created by Artificial Analysis that reflects a model's performance across multiple benchmarks including MMLU, HumanEval, MATH, GPQA, and others. Higher is better. It's the most holistic single measure of a model's capability.
Blended price is the average cost per million tokens, weighting both input and output token prices. The typical weighting we use is 3:1 input-to-output, which reflects common real-world usage patterns. This gives you a single, comparable price point across models.
The AI Agents Hub tracks real-time usage data for popular AI agents like Claude Code, Cline, Kilo Code, and OpenClaw. You can see which LLM models each agent uses, monthly token consumption, trending growth, and compare agents across categories like coding, productivity, and creative tools.
Our Open Source hub ranks open-weight LLMs by efficiency (quality per parameter), tracks HuggingFace downloads and trending scores, and provides hardware-tier recommendations so you can find the best model for your GPU. We cover consumer (0–14B), prosumer (14–72B), and datacenter (72B+) tiers.
The Agentic leaderboard ranks models by their average coding and general quality scores — the two capabilities most critical for AI agent use cases. This helps you pick the best foundation model for building autonomous agents, coding assistants, and tool-using AI systems.
Start with the LLM Selector tool — answer a few questions about your use case (coding, chat, analysis) and budget, and we'll rank models for you. Or use the Explore page to compare models on a scatter plot of quality vs. price. For specific tasks, check the relevant Leaderboard.
Yes, completely free. No account required, no paywalls. Our goal is to make LLM comparison accessible to everyone.
Data on FindLLM stays current with the underlying sources. Pricing and model metadata change most frequently; benchmark scores update as new evaluations are published.
We track the main benchmarks reported by Artificial Analysis, including MMLU, HumanEval, MATH, GPQA, MT-Bench, and others. Each model's benchmark breakdown is visible on its detail page. We also track output speed (tokens per second) and time-to-first-token latency.
Check out the Methodology page for a deep dive into how benchmarks and metrics are calculated, or visit the About page to learn more about FindLLM.
FindLLM provides information for educational and comparison purposes only. Benchmark scores, pricing, and performance metrics are sourced from third-party providers and may change without notice. We strive for accuracy but cannot guarantee that all data is current or error-free. Model performance in production may differ from benchmark results. FindLLM is not affiliated with any AI model provider. Always verify critical information directly with providers before making purchasing decisions.