Methodology

Data Sources

FindLLM aggregates data from multiple trusted sources to provide a comprehensive view of the LLM landscape.

Artificial Analysis — benchmarks, quality scores, speed metrics, and time-to-first-token measurements.
OpenRouter — real-time pricing, context lengths, provider availability, and model metadata.
HuggingFace — downloads, trending scores, licenses, parameter counts, and open source metadata.

Quality Index

The Quality Index is a composite score (0–100) created by Artificial Analysis that reflects a model's performance across multiple benchmarks. It provides a single, comparable measure of overall model capability.

Benchmarks

We track the following benchmarks:

MMLU-Pro — Massive Multitask Language Understanding with harder questions.
GPQA Diamond — Graduate-level science questions at highest difficulty.
HumanEval / LiveCodeBench — Code generation and problem solving.
MATH / AIME — Mathematical reasoning at competition level.
IFEval — Instruction-following across diverse tasks.
MT-Bench — Multi-turn conversation quality.
RULER — Long context recall and utilization.

Speed Metrics

Output speed is measured in tokens per second (tok/s) as reported by Artificial Analysis. Time to First Token (TTFT) measures the latency before the first token appears. Both are measured under standardized conditions.

Pricing

Prices are sourced from OpenRouter's model-level pricing, which reflects the default rate available through their API. Blended price uses a 3:1 input-to-output ratio, reflecting typical usage patterns. All prices are per million tokens.

Update Frequency

Pricing from OpenRouter and model metadata from HuggingFace are refreshed hourly. Benchmark data from Artificial Analysis is refreshed every 6 hours. AI media models are refreshed every 8 hours.

Glossary of Terms

Quality Index: Composite benchmark score (0–100) from Artificial Analysis.
Blended Price: Weighted average of input and output token prices (3:1 ratio).
tok/s: Tokens per second, the output generation speed.
TTFT: Time to First Token, the latency before the first response token arrives.
Context Window: Maximum number of tokens a model can process in a single request.
Open Source: Models with publicly available weights for download and self-hosting.
Provider: A service that hosts and serves the model via API (e.g., OpenRouter, Together, Fireworks).
Parameters: The number of trainable weights in a model, indicating its size and capacity.