Skip to content

Home FinOps Models

TokenPrice.dev

Cross-provider AI FinOps decision intelligence backed by live pricing, benchmarks, performance, and compliance data across 350+ models.

Open FinOps Console Explore Models

Product

Home FinOps Console Market Pulse RSS Feed Contact

Data Tools

All Models Providers Benchmarks Price Changes

Decision outputs are synthesized AI analysis for informational use only and are not professional legal, medical, or financial advice.

Benchmark scores are aggregated from public sources and may not reflect the latest evaluations.

Home
Benchmarks

Benchmark Directory

AI Model Benchmarks

Explore leaderboards for 2 benchmarks. Each leaderboard includes a value ranking showing which models deliver the best performance per dollar.

MMLU

127 models scored

Massive Multitask Language Understanding — tests knowledge across 57 academic subjects.

#1Google: Gemini 2.5 Flash88.2

#2MiniMax: MiniMax M2.587.5

#3MiniMax: MiniMax M2.787.5

Best value: Meta: Llama 3.1 8B Instruct(47.6 score @ $0.03/M)

GPQA

125 models scored

Graduate-level science questions vetted by domain experts for difficulty.

#1DeepSeek: DeepSeek V3.2 Speciale87.1

#2OpenAI: GPT-5 Codex86.0

#3OpenAI: GPT-5.2-Codex86.0

Best value: Qwen: Qwen3 235B A22B Instruct 2507(75.3 score @ $0.09/M)