Benchmark Directory

AI Model Benchmarks

Explore leaderboards for 2 benchmarks. Each leaderboard includes a value ranking showing which models deliver the best performance per dollar.

MMLU

120 models scored

#1Google: Gemini 2.5 Pro89.5
#2Google: Gemini 2.5 Flash88.2
#3MiniMax: MiniMax M2.587.5
Best value: LiquidAI: LFM2-8B-A1B(50.5 score @ $0.01/M)

GPQA

117 models scored

#1Google: Gemini 2.5 Pro88.7
#2DeepSeek: DeepSeek V3.2 Speciale87.1
#3OpenAI: GPT-5.2-Codex86.0
Best value: LiquidAI: LFM2-8B-A1B(34.4 score @ $0.01/M)