Benchmark Leaderboard

MMLU

120 models with MMLU scores, ranked from highest to lowest.

Score Leaderboard

#ModelDeveloperScore
1Google: Gemini 2.5 Progoogle89.5
2Google: Gemini 2.5 Flashgoogle88.2
3MiniMax: MiniMax M2.5minimax87.5
4Anthropic: Claude 3.7 Sonnet (thinking)anthropic87.5
5OpenAI: GPT-5openai87.1
6OpenAI: GPT-5 Codexopenai86.5
7DeepSeek: DeepSeek V3.2 Specialedeepseek86.3
8OpenAI: GPT-5.1-Codexopenai86.0
9OpenAI: GPT-5.2-Codexopenai86.0
10OpenAI: o3openai85.3
11DeepSeek: R1deepseek84.9
12MoonshotAI: Kimi K2 Thinkingmoonshotai84.8
13Qwen: Qwen3 235B A22B Thinking 2507qwen84.3
14Qwen: Qwen3 Maxqwen84.1
15OpenAI: o1openai84.1
16DeepSeek: DeepSeek V3.2deepseek83.7
17Google: Gemini 2.5 Pro Preview 05-06google83.7
18OpenAI: GPT-5 Miniopenai83.7
19Qwen: Qwen3 VL 235B A22B Thinkingqwen83.6
20DeepSeek: DeepSeek V3.1 Terminusdeepseek83.6
21OpenAI: o4 Miniopenai83.2
22Nous: Hermes 3 405B Instructnousresearch82.9
23Qwen: Qwen3 235B A22B Instruct 2507qwen82.8
24xAI: Grok 3 Minix-ai82.8
25MoonshotAI: Kimi K2.5moonshotai82.4
26Qwen: Qwen3 Max Thinkingqwen82.4
27Qwen: Qwen3 Next 80B A3B Thinkingqwen82.4
28Qwen: Qwen3 VL 235B A22B Instructqwen82.3
29Prime Intellect: INTELLECT-3prime-intellect82.2
30MiniMax: MiniMax M2minimax82.0
31OpenAI: GPT-5.1-Codex-Miniopenai82.0
32MoonshotAI: Kimi K2 0905moonshotai81.9
33Qwen: Qwen3 Next 80B A3B Instructqwen81.9
34Qwen: Qwen3 VL 32B Instructqwen81.8
35Z.ai: GLM 4.5 Airz-ai81.5
36NVIDIA: Llama 3.3 Nemotron Super 49B V1.5nvidia81.4
37Kwaipilot: KAT-Coder-Pro V1kwaipilot81.3
38OpenAI: gpt-oss-120bopenai80.8
39MiniMax: MiniMax M1minimax80.8
40Mistral Largemistralai80.7
41Qwen: Qwen3 VL 30B A3B Thinkingqwen80.7
42Qwen: Qwen3 30B A3B Thinking 2507qwen80.5
43Upstage: Solar Pro 3upstage80.5
44OpenAI: o3 Mini Highopenai80.2
45Z.ai: GLM 4.6Vz-ai79.9
46Google: Gemini 2.5 Flash Lite Preview 09-2025google79.6
47DeepSeek: R1 Distill Llama 70Bdeepseek79.5
48NVIDIA: Nemotron 3 Nano 30B A3B (free)nvidia79.4
49xAI: Grok Code Fast 1x-ai79.3
50OpenAI: o3 Miniopenai79.1
51OpenAI: GPT-5 Nanoopenai78.0
52Qwen: Qwen3 30B A3Bqwen77.7
53Baidu: ERNIE 4.5 300B A47B baidu77.6
54OpenAI: GPT-4.1 Miniopenai77.5
55Anthropic: Claude Sonnet 4.6anthropic77.2
56Qwen: Qwen3 VL 30B A3B Instructqwen76.4
57AllenAI: Olmo 3.1 32B Thinkallenai76.3
58Qwen: Qwen3 235B A22Bqwen76.2
59Anthropic: Claude Haiku 4.5anthropic76.0
60NVIDIA: Nemotron Nano 12B 2 VL (free)nvidia75.9
61Google: Gemini 2.5 Flash Litegoogle75.9
62Perplexity: Sonar Properplexity75.5
63Z.ai: GLM 4.5Vz-ai75.1
64Qwen: Qwen3 VL 8B Thinkingqwen74.9
65OpenAI: GPT-4oopenai74.8
66Xiaomi: MiMo-V2-Flashxiaomi74.4
67xAI: Grok 4 Fastx-ai74.3
68xAI: Grok 4.1 Fastx-ai74.3
69NVIDIA: Nemotron Nano 9B V2 (free)nvidia74.2
70NVIDIA: Nemotron Nano 9B V2nvidia74.2
71OpenAI: GPT-4o (2024-05-13)openai74.0
72DeepSeek: R1 Distill Qwen 32Bdeepseek73.9
73Meta: Llama 3.1 405B Instructmeta-llama73.2
74Qwen: Qwen3 32Bqwen72.7
75Qwen: Qwen3 30B A3B Instruct 2507qwen72.5
76Google: Gemini 2.0 Flash Litegoogle72.4
77Qwen: Qwen2.5 VL 72B Instructqwen72.0
78OpenAI: gpt-oss-20bopenai71.8
79NVIDIA: Llama 3.1 Nemotron 70B Instructnvidia71.3
80Meta: Llama 3.3 70B Instructmeta-llama71.3
81Cohere: Command Acohere71.2
82Mistral: Devstral Mediummistralai70.8
83Qwen: Qwen3 Coder 30B A3B Instructqwen70.6
84xAI: Grok 3 Betax-ai70.3
85Mistral: Pixtral Large 2411mistralai70.1
86Anthropic: Claude Opus 4.5anthropic69.6
87Anthropic: Claude Opus 4.6anthropic69.6
88OpenAI: GPT-4 Turboopenai69.4
89Perplexity: Sonarperplexity68.9
90Qwen: Qwen3 VL 8B Instructqwen68.6
91Mistral: Mistral Medium 3.1mistralai68.3
92Mistral: Mistral Medium 3mistralai68.3
93Mistral Large 2407mistralai68.3
94Mistral: Mistral Small 3mistralai68.1
95Mistral: Devstral Small 1.1mistralai67.8
96Qwen: Qwen3 14Bqwen67.5
97Nous: Hermes 3 70B Instructnousresearch66.4
98AllenAI: Olmo 3 7B Thinkallenai65.5
99NVIDIA: Nemotron Nano 12B 2 VLnvidia64.9
100Qwen: QwQ 32Bqwen64.8
101OpenAI: GPT-4o-miniopenai64.8
102Qwen: Qwen3 8Bqwen64.3
103Qwen: Qwen2.5 VL 32B Instructqwen63.5
104Qwen2.5 Coder 32B Instructqwen63.5
105Qwen: Qwen-Turboqwen63.3
106Mistral: Sabamistralai61.1
107NVIDIA: Nemotron 3 Nano 30B A3Bnvidia57.9
108AI21: Jamba Large 1.7ai2157.7
109AllenAI: Olmo 3 7B Instructallenai52.2
110LiquidAI: LFM2-8B-A1Bliquid50.5
111Google: Gemma 3n 4Bgoogle48.8
112Meta: Llama 3.1 8B Instructmeta-llama47.6
113Qwen: Qwen2.5 Coder 7B Instructqwen47.3
114Meta: Llama 3.2 11B Vision Instructmeta-llama46.4
115OpenAI: GPT-3.5 Turboopenai46.2
116Cohere: Command R+ (08-2024)cohere43.2
117Mistral: Mixtral 8x7B Instructmistralai38.7
118Meta: Llama 3.2 3B Instructmeta-llama34.7
119Mistral: Mistral 7B Instruct v0.1mistralai24.5
120Meta: Llama 3.2 1B Instructmeta-llama20.0

Value Rankings

Best performance per dollar — score divided by avg price per million tokens.

#ModelScoreAvg $/MValue
1LiquidAI: LFM2-8B-A1B50.5$0.013366.7
2Google: Gemma 3n 4B48.8$0.031626.7
3Meta: Llama 3.1 8B Instruct47.6$0.031360.0
4Mistral: Mistral Small 368.1$0.071047.7
5Qwen: Qwen3 235B A22B Instruct 250782.8$0.09968.4
6Meta: Llama 3.2 11B Vision Instruct46.4$0.05946.9
7OpenAI: gpt-oss-20b71.8$0.09844.7
8Qwen: Qwen2.5 Coder 7B Instruct47.3$0.06788.3
9Qwen: Qwen-Turbo63.3$0.08779.1
10NVIDIA: Nemotron Nano 9B V274.2$0.10742.0
11OpenAI: gpt-oss-120b80.8$0.11705.7
12NVIDIA: Nemotron 3 Nano 30B A3B57.9$0.12463.2
13Qwen: Qwen3 32B72.7$0.16454.4
14Qwen: Qwen3 14B67.5$0.15450.0
15Qwen: Qwen3 30B A3B77.7$0.18431.7
16Qwen: Qwen3 Coder 30B A3B Instruct70.6$0.17415.3
17Qwen: Qwen3 30B A3B Thinking 250780.5$0.20411.8
18AllenAI: Olmo 3 7B Think65.5$0.16409.4
19Xiaomi: MiMo-V2-Flash74.4$0.19391.6
20Google: Gemini 2.0 Flash Lite72.4$0.19386.1
21Qwen: Qwen3 30B A3B Instruct 250772.5$0.20371.8
22AllenAI: Olmo 3 7B Instruct52.2$0.15348.0
23OpenAI: GPT-5 Nano78.0$0.22346.7
24Meta: Llama 3.3 70B Instruct71.3$0.21339.5
25Mistral: Devstral Small 1.167.8$0.20339.0
26NVIDIA: Llama 3.3 Nemotron Super 49B V1.581.4$0.25325.6
27Google: Gemini 2.5 Flash Lite Preview 09-202579.6$0.25318.4
28Qwen2.5 Coder 32B Instruct63.5$0.20317.5
29Qwen: Qwen3 VL 32B Instruct81.8$0.26314.6
30Google: Gemini 2.5 Flash Lite75.9$0.25303.6
31Qwen: Qwen3 8B64.3$0.22285.8
32DeepSeek: DeepSeek V3.283.7$0.32257.5
33DeepSeek: R1 Distill Qwen 32B73.9$0.29254.8
34Qwen: Qwen3 235B A22B Thinking 250784.3$0.35237.5
35Qwen: Qwen3 VL 8B Instruct68.6$0.29236.6
36Qwen: QwQ 32B64.8$0.27235.6
37Qwen: Qwen3 VL 30B A3B Instruct76.4$0.33235.1
38AllenAI: Olmo 3.1 32B Think76.3$0.33234.8
39Nous: Hermes 3 70B Instruct66.4$0.30221.3
40Upstage: Solar Pro 380.5$0.38214.7
41xAI: Grok 4 Fast74.3$0.35212.3
42xAI: Grok 4.1 Fast74.3$0.35212.3
43xAI: Grok 3 Mini82.8$0.40207.0
44Meta: Llama 3.2 3B Instruct34.7$0.20177.5
45Meta: Llama 3.2 1B Instruct20.0$0.11176.2
46OpenAI: GPT-4o-mini64.8$0.38172.8
47DeepSeek: DeepSeek V3.1 Terminus83.6$0.50167.2
48Z.ai: GLM 4.5 Air81.5$0.49166.3
49Mistral: Mistral 7B Instruct v0.124.5$0.15163.3
50NVIDIA: Nemotron Nano 12B 2 VL64.9$0.40162.3
51Qwen: Qwen2.5 VL 32B Instruct63.5$0.40158.8
52Kwaipilot: KAT-Coder-Pro V181.3$0.52157.1
53Mistral: Saba61.1$0.40152.8
54Qwen: Qwen3 VL 235B A22B Instruct82.3$0.54152.4
55Qwen: Qwen3 Next 80B A3B Instruct81.9$0.60137.6
56Z.ai: GLM 4.6V79.9$0.60133.2
57MiniMax: MiniMax M282.0$0.63130.7
58Prime Intellect: INTELLECT-382.2$0.65126.5
59Qwen: Qwen3 Next 80B A3B Thinking82.4$0.67122.1
60MiniMax: MiniMax M2.587.5$0.75117.1
61Baidu: ERNIE 4.5 300B A47B 77.6$0.69112.5
62DeepSeek: DeepSeek V3.2 Speciale86.3$0.80107.9
63DeepSeek: R1 Distill Llama 70B79.5$0.75106.0
64Qwen: Qwen3 VL 8B Thinking74.9$0.74101.1
65xAI: Grok Code Fast 179.3$0.8593.3
66Qwen: Qwen2.5 VL 72B Instruct72.0$0.8090.0
67Nous: Hermes 3 405B Instruct82.9$1.0082.9
68OpenAI: GPT-4.1 Mini77.5$1.0077.5
69OpenAI: GPT-5 Mini83.7$1.1374.4
70OpenAI: GPT-5.1-Codex-Mini82.0$1.1372.9
71Mistral: Mixtral 8x7B Instruct38.7$0.5471.7
72Perplexity: Sonar68.9$1.0068.9
73MoonshotAI: Kimi K2 Thinking84.8$1.2368.7
74MoonshotAI: Kimi K2 090581.9$1.2068.3
75Qwen: Qwen3 235B A22B76.2$1.1467.0
76Google: Gemini 2.5 Flash88.2$1.4063.0
77Z.ai: GLM 4.5V75.1$1.2062.6
78MoonshotAI: Kimi K2.582.4$1.3362.2
79MiniMax: MiniMax M180.8$1.3062.2
80NVIDIA: Llama 3.1 Nemotron 70B Instruct71.3$1.2059.4
81Mistral: Devstral Medium70.8$1.2059.0
82Mistral: Mistral Medium 3.168.3$1.2056.9
83Mistral: Mistral Medium 368.3$1.2056.9
84DeepSeek: R184.9$1.6053.1
85OpenAI: GPT-3.5 Turbo46.2$1.0046.2
86Qwen: Qwen3 Max Thinking82.4$2.3435.2
87OpenAI: o4 Mini83.2$2.7530.3
88OpenAI: o3 Mini High80.2$2.7529.2
89OpenAI: o3 Mini79.1$2.7528.8
90Anthropic: Claude Haiku 4.576.0$3.0025.3
91Qwen: Qwen3 Max84.1$3.6023.4
92Mistral Large80.7$4.0020.2
93Meta: Llama 3.1 405B Instruct73.2$4.0018.3
94Mistral: Pixtral Large 241170.1$4.0017.5
95Mistral Large 240768.3$4.0017.1
96OpenAI: o385.3$5.0017.1
97Google: Gemini 2.5 Pro89.5$5.6315.9
98OpenAI: GPT-587.1$5.6315.5
99OpenAI: GPT-5 Codex86.5$5.6315.4
100OpenAI: GPT-5.1-Codex86.0$5.6315.3
101Google: Gemini 2.5 Pro Preview 05-0683.7$5.6314.9
102OpenAI: GPT-4o74.8$6.2512.0
103AI21: Jamba Large 1.757.7$5.0011.5
104Cohere: Command A71.2$6.2511.4
105OpenAI: GPT-5.2-Codex86.0$7.8810.9
106Anthropic: Claude 3.7 Sonnet (thinking)87.5$9.009.7
107Anthropic: Claude Sonnet 4.677.2$9.008.6
108Perplexity: Sonar Pro75.5$9.008.4
109xAI: Grok 3 Beta70.3$9.007.8
110OpenAI: GPT-4o (2024-05-13)74.0$10.007.4
111Cohere: Command R+ (08-2024)43.2$6.256.9
112Anthropic: Claude Opus 4.569.6$15.004.6
113Anthropic: Claude Opus 4.669.6$15.004.6
114OpenAI: GPT-4 Turbo69.4$20.003.5
115OpenAI: o184.1$37.502.2