Scientific Reasoning Leaderboard
Models ranked by GPQA Diamond, based on independent benchmark evaluations.
Each row shows the model's benchmark score alongside its pricing and output speed, so you can evaluate quality-to-cost tradeoffs at a glance.
Scientific Reasoning Leaderboard
Top 20 models ranked by gpqa diamond
| Rank | Model | GPQA Diamond |
|---|---|---|
| 🥇 | 0.9 | |
| 🥈 | OpenAI | 0.9 |
| 🥉 | OpenAI | 0.9 |
| 4 | MiniMax | 0.9 |
| 5 | Anthropic | 0.9 |
| 6 | OpenAI | 0.9 |
| 7 | Alibaba | 0.9 |
| 8 | 0.9 | |
| 9 | 0.9 | |
| 10 | Anthropic | 0.9 |
| 11 | OpenAI | 0.9 |
| 12 | OpenAI | 0.9 |
| 13 | Anthropic | 0.9 |
| 14 | Kimi | 0.9 |
| 15 | xAI | 0.9 |
| 16 | OpenAI | 0.9 |
| 17 | 0.9 | |
| 18 | DeepSeek | 0.9 |
| 19 | OpenAI | 0.9 |
| 20 | xAI | 0.9 |