Benchmark Leaderboard

GPQA

117 models with GPQA scores, ranked from highest to lowest.

Score Leaderboard

#ModelDeveloperScore
1Google: Gemini 2.5 Progoogle88.7
2DeepSeek: DeepSeek V3.2 Specialedeepseek87.1
3OpenAI: GPT-5.2-Codexopenai86.0
4OpenAI: GPT-5.1-Codexopenai86.0
5OpenAI: GPT-5openai85.4
6MoonshotAI: Kimi K2 Thinkingmoonshotai83.8
7OpenAI: GPT-5 Codexopenai83.7
8Anthropic: Claude 3.7 Sonnet (thinking)anthropic83.4
9MiniMax: MiniMax M2.5minimax83.0
10OpenAI: GPT-5 Miniopenai82.8
11OpenAI: o3openai82.7
12Google: Gemini 2.5 Pro Preview 05-06google82.2
13DeepSeek: R1deepseek81.3
14OpenAI: GPT-5.1-Codex-Miniopenai81.3
15Google: Gemini 2.5 Flashgoogle81.2
16Baidu: ERNIE 4.5 300B A47B baidu81.1
17xAI: Grok 3 Minix-ai79.1
18Qwen: Qwen3 235B A22B Thinking 2507qwen79.0
19OpenAI: o4 Miniopenai78.4
20OpenAI: gpt-oss-120bopenai78.2
21MiniMax: MiniMax M2minimax77.7
22Qwen: Qwen3 Max Thinkingqwen77.6
23OpenAI: o3 Mini Highopenai77.3
24Qwen: Qwen3 VL 235B A22B Thinkingqwen77.2
25MoonshotAI: Kimi K2 0905moonshotai76.7
26MoonshotAI: Kimi K2.5moonshotai76.6
27Qwen: Qwen3 Maxqwen76.4
28Kwaipilot: KAT-Coder-Pro V1kwaipilot76.4
29Prime Intellect: INTELLECT-3prime-intellect76.1
30Qwen: Qwen3 Next 80B A3B Thinkingqwen75.9
31NVIDIA: Nemotron 3 Nano 30B A3B (free)nvidia75.7
32Qwen: Qwen3 235B A22B Instruct 2507qwen75.3
33DeepSeek: DeepSeek V3.2deepseek75.1
34DeepSeek: DeepSeek V3.1 Terminusdeepseek75.1
35NVIDIA: Llama 3.3 Nemotron Super 49B V1.5nvidia74.8
36OpenAI: o3 Miniopenai74.8
37OpenAI: o1openai74.7
38Qwen: Qwen3 Next 80B A3B Instructqwen73.8
39Z.ai: GLM 4.5 Airz-ai73.3
40Qwen: Qwen3 VL 32B Instructqwen73.3
41Nous: Hermes 3 405B Instructnousresearch72.7
42xAI: Grok Code Fast 1x-ai72.7
43Qwen: Qwen3 VL 30B A3B Thinkingqwen72.0
44Z.ai: GLM 4.6Vz-ai71.9
45Qwen: Qwen3 VL 235B A22B Instructqwen71.2
46Qwen: Qwen3 30B A3B Thinking 2507qwen70.7
47Qwen: Qwen3 VL 30B A3B Instructqwen69.5
48Upstage: Solar Pro 3upstage68.7
49OpenAI: GPT-4.1 Miniopenai68.7
50MiniMax: MiniMax M1minimax68.2
51Mistral Largemistralai68.0
52OpenAI: GPT-5 Nanoopenai67.6
53Anthropic: Claude Haiku 4.5anthropic67.2
54Qwen: Qwen3 30B A3Bqwen65.9
55Xiaomi: MiMo-V2-Flashxiaomi65.6
56Google: Gemini 2.5 Flash Lite Preview 09-2025google65.1
57xAI: Grok 4 Fastx-ai63.7
58xAI: Grok 4.1 Fastx-ai63.7
59Google: Gemini 2.5 Flash Litegoogle62.5
60Qwen: Qwen3 30B A3B Instruct 2507qwen62.0
61DeepSeek: R1 Distill Qwen 32Bdeepseek61.5
62Qwen: Qwen3 235B A22Bqwen61.3
63OpenAI: gpt-oss-20bopenai61.1
64Anthropic: Claude Sonnet 4.6anthropic59.9
65AllenAI: Olmo 3.1 32B Thinkallenai59.1
66Mistral: Mistral Medium 3.1mistralai58.8
67Qwen: Qwen3 VL 8B Thinkingqwen57.9
68Perplexity: Sonar Properplexity57.8
69Z.ai: GLM 4.5Vz-ai57.3
70NVIDIA: Nemotron Nano 12B 2 VL (free)nvidia57.2
71NVIDIA: Nemotron Nano 9B V2nvidia57.0
72NVIDIA: Nemotron Nano 9B V2 (free)nvidia57.0
73Qwen: QwQ 32Bqwen55.7
74OpenAI: GPT-4oopenai54.3
75Qwen: Qwen3 32Bqwen53.5
76Google: Gemini 2.0 Flash Litegoogle53.5
77Mistral: Devstral Small 1.1mistralai53.2
78Cohere: Command Acohere52.7
79OpenAI: GPT-4o (2024-05-13)openai52.6
80Qwen: Qwen3 Coder 30B A3B Instructqwen51.6
81AllenAI: Olmo 3 7B Thinkallenai51.6
82Meta: Llama 3.1 405B Instructmeta-llama51.5
83Mistral: Pixtral Large 2411mistralai50.5
84NVIDIA: Llama 3.1 Nemotron 70B Instructnvidia49.8
85Meta: Llama 3.3 70B Instructmeta-llama49.8
86Mistral: Devstral Mediummistralai49.2
87Qwen: Qwen2.5 VL 72B Instructqwen49.1
88Nous: Hermes 3 70B Instructnousresearch49.1
89Anthropic: Claude Opus 4.5anthropic48.9
90Anthropic: Claude Opus 4.6anthropic48.9
91Mistral Large 2407mistralai47.2
92xAI: Grok 3 Betax-ai47.1
93Perplexity: Sonarperplexity47.1
94Qwen: Qwen3 14Bqwen47.0
95Qwen: Qwen3 8Bqwen45.2
96NVIDIA: Nemotron Nano 12B 2 VLnvidia43.9
97Qwen: Qwen3 VL 8B Instructqwen42.7
98OpenAI: GPT-4o-miniopenai42.6
99Mistral: Sabamistralai42.4
100Qwen2.5 Coder 32B Instructqwen41.7
101Qwen: Qwen2.5 VL 32B Instructqwen41.7
102Qwen: Qwen-Turboqwen41.0
103DeepSeek: R1 Distill Llama 70Bdeepseek40.2
104AllenAI: Olmo 3 7B Instructallenai40.0
105NVIDIA: Nemotron 3 Nano 30B A3Bnvidia39.9
106AI21: Jamba Large 1.7ai2139.0
107LiquidAI: LFM2-8B-A1Bliquid34.4
108Qwen: Qwen2.5 Coder 7B Instructqwen33.9
109Cohere: Command R+ (08-2024)cohere32.3
110OpenAI: GPT-3.5 Turboopenai29.7
111Google: Gemma 3n 4Bgoogle29.6
112Mistral: Mixtral 8x7B Instructmistralai29.2
113Meta: Llama 3.1 8B Instructmeta-llama25.9
114Meta: Llama 3.2 3B Instructmeta-llama25.5
115Meta: Llama 3.2 11B Vision Instructmeta-llama22.1
116Meta: Llama 3.2 1B Instructmeta-llama19.6
117Mistral: Mistral 7B Instruct v0.1mistralai17.7

Value Rankings

Best performance per dollar — score divided by avg price per million tokens.

#ModelScoreAvg $/MValue
1LiquidAI: LFM2-8B-A1B34.4$0.012293.3
2Google: Gemma 3n 4B29.6$0.03986.7
3Qwen: Qwen3 235B A22B Instruct 250775.3$0.09880.7
4Meta: Llama 3.1 8B Instruct25.9$0.03740.0
5OpenAI: gpt-oss-20b61.1$0.09718.8
6OpenAI: gpt-oss-120b78.2$0.11683.0
7NVIDIA: Nemotron Nano 9B V257.0$0.10570.0
8Qwen: Qwen2.5 Coder 7B Instruct33.9$0.06565.0
9Qwen: Qwen-Turbo41.0$0.08504.6
10Meta: Llama 3.2 11B Vision Instruct22.1$0.05451.0
11Qwen: Qwen3 30B A3B65.9$0.18366.1
12Qwen: Qwen3 30B A3B Thinking 250770.7$0.20361.6
13Xiaomi: MiMo-V2-Flash65.6$0.19345.3
14Qwen: Qwen3 32B53.5$0.16334.4
15AllenAI: Olmo 3 7B Think51.6$0.16322.5
16NVIDIA: Nemotron 3 Nano 30B A3B39.9$0.12319.2
17Qwen: Qwen3 30B A3B Instruct 250762.0$0.20317.9
18Qwen: Qwen3 14B47.0$0.15313.3
19Qwen: Qwen3 Coder 30B A3B Instruct51.6$0.17303.5
20OpenAI: GPT-5 Nano67.6$0.22300.4
21NVIDIA: Llama 3.3 Nemotron Super 49B V1.574.8$0.25299.2
22Google: Gemini 2.0 Flash Lite53.5$0.19285.3
23Qwen: Qwen3 VL 32B Instruct73.3$0.26281.9
24AllenAI: Olmo 3 7B Instruct40.0$0.15266.7
25Mistral: Devstral Small 1.153.2$0.20266.0
26Google: Gemini 2.5 Flash Lite Preview 09-202565.1$0.25260.4
27Google: Gemini 2.5 Flash Lite62.5$0.25250.0
28Meta: Llama 3.3 70B Instruct49.8$0.21237.1
29DeepSeek: DeepSeek V3.275.1$0.32231.1
30Qwen: Qwen3 235B A22B Thinking 250779.0$0.35222.5
31Qwen: Qwen3 VL 30B A3B Instruct69.5$0.33213.8
32DeepSeek: R1 Distill Qwen 32B61.5$0.29212.1
33Qwen2.5 Coder 32B Instruct41.7$0.20208.5
34Qwen: QwQ 32B55.7$0.27202.5
35Qwen: Qwen3 8B45.2$0.22200.9
36xAI: Grok 3 Mini79.1$0.40197.7
37Upstage: Solar Pro 368.7$0.38183.2
38xAI: Grok 4 Fast63.7$0.35182.0
39xAI: Grok 4.1 Fast63.7$0.35182.0
40AllenAI: Olmo 3.1 32B Think59.1$0.33181.8
41Meta: Llama 3.2 1B Instruct19.6$0.11172.7
42Nous: Hermes 3 70B Instruct49.1$0.30163.7
43DeepSeek: DeepSeek V3.1 Terminus75.1$0.50150.2
44Z.ai: GLM 4.5 Air73.3$0.49149.6
45Kwaipilot: KAT-Coder-Pro V176.4$0.52147.6
46Qwen: Qwen3 VL 8B Instruct42.7$0.29147.2
47Qwen: Qwen3 VL 235B A22B Instruct71.2$0.54131.9
48Meta: Llama 3.2 3B Instruct25.5$0.20130.4
49Qwen: Qwen3 Next 80B A3B Instruct73.8$0.60124.0
50MiniMax: MiniMax M277.7$0.63123.8
51Z.ai: GLM 4.6V71.9$0.60119.8
52Mistral: Mistral 7B Instruct v0.117.7$0.15118.0
53Baidu: ERNIE 4.5 300B A47B 81.1$0.69117.5
54Prime Intellect: INTELLECT-376.1$0.65117.1
55OpenAI: GPT-4o-mini42.6$0.38113.6
56Qwen: Qwen3 Next 80B A3B Thinking75.9$0.67112.4
57MiniMax: MiniMax M2.583.0$0.75111.0
58NVIDIA: Nemotron Nano 12B 2 VL43.9$0.40109.8
59DeepSeek: DeepSeek V3.2 Speciale87.1$0.80108.9
60Mistral: Saba42.4$0.40106.0
61Qwen: Qwen2.5 VL 32B Instruct41.7$0.40104.3
62xAI: Grok Code Fast 172.7$0.8585.5
63Qwen: Qwen3 VL 8B Thinking57.9$0.7478.1
64OpenAI: GPT-5 Mini82.8$1.1373.6
65Nous: Hermes 3 405B Instruct72.7$1.0072.7
66OpenAI: GPT-5.1-Codex-Mini81.3$1.1372.3
67OpenAI: GPT-4.1 Mini68.7$1.0068.7
68MoonshotAI: Kimi K2 Thinking83.8$1.2367.9
69MoonshotAI: Kimi K2 090576.7$1.2063.9
70Qwen: Qwen2.5 VL 72B Instruct49.1$0.8061.4
71Google: Gemini 2.5 Flash81.2$1.4058.0
72MoonshotAI: Kimi K2.576.6$1.3357.8
73Mistral: Mixtral 8x7B Instruct29.2$0.5454.1
74Qwen: Qwen3 235B A22B61.3$1.1453.9
75DeepSeek: R1 Distill Llama 70B40.2$0.7553.6
76MiniMax: MiniMax M168.2$1.3052.5
77DeepSeek: R181.3$1.6050.8
78Mistral: Mistral Medium 3.158.8$1.2049.0
79Z.ai: GLM 4.5V57.3$1.2047.8
80Perplexity: Sonar47.1$1.0047.1
81NVIDIA: Llama 3.1 Nemotron 70B Instruct49.8$1.2041.5
82Mistral: Devstral Medium49.2$1.2041.0
83Qwen: Qwen3 Max Thinking77.6$2.3433.2
84OpenAI: GPT-3.5 Turbo29.7$1.0029.7
85OpenAI: o4 Mini78.4$2.7528.5
86OpenAI: o3 Mini High77.3$2.7528.1
87OpenAI: o3 Mini74.8$2.7527.2
88Anthropic: Claude Haiku 4.567.2$3.0022.4
89Qwen: Qwen3 Max76.4$3.6021.2
90Mistral Large68.0$4.0017.0
91OpenAI: o382.7$5.0016.5
92Google: Gemini 2.5 Pro88.7$5.6315.8
93OpenAI: GPT-5.1-Codex86.0$5.6315.3
94OpenAI: GPT-585.4$5.6315.2
95OpenAI: GPT-5 Codex83.7$5.6314.9
96Google: Gemini 2.5 Pro Preview 05-0682.2$5.6314.6
97Meta: Llama 3.1 405B Instruct51.5$4.0012.9
98Mistral: Pixtral Large 241150.5$4.0012.6
99Mistral Large 240747.2$4.0011.8
100OpenAI: GPT-5.2-Codex86.0$7.8810.9
101Anthropic: Claude 3.7 Sonnet (thinking)83.4$9.009.3
102OpenAI: GPT-4o54.3$6.258.7
103Cohere: Command A52.7$6.258.4
104AI21: Jamba Large 1.739.0$5.007.8
105Anthropic: Claude Sonnet 4.659.9$9.006.7
106Perplexity: Sonar Pro57.8$9.006.4
107OpenAI: GPT-4o (2024-05-13)52.6$10.005.3
108xAI: Grok 3 Beta47.1$9.005.2
109Cohere: Command R+ (08-2024)32.3$6.255.2
110Anthropic: Claude Opus 4.548.9$15.003.3
111Anthropic: Claude Opus 4.648.9$15.003.3
112OpenAI: o174.7$37.502.0