Model Cost Profile

Qwen: Qwen3 VL 8B Thinking

Developer: qwen· Tokenizer: Qwen3 · Quantization: unknown

Canonical ID: qwen/qwen3-vl-8b-thinking

Pricing updated Apr 23, 2026

Input rank: #98Output rank: #193

Live Pricing

Input: $0.1170

Output: $1.37

Visit Qwen ↗HuggingFace ↗View full pricing leaderboard

Last synced Apr 23, 2026 · MMLU score via public benchmark data

Qwen: Qwen3 VL 8B Thinking is designed for applications requiring extensive context processing, offering a substantial context window of 131072 tokens that supports complex tasks such as document summarization and conversational AI. With an input price of $0.12 per million tokens and an output price of $1.36 per million tokens, teams can effectively budget for high-volume use cases while managing operational costs. This model is particularly suited for businesses needing scalable solutions for natural language understanding and generation, making it a cost-effective choice for data-intensive projects.

👁 Vision🔧 Tool Calling🔌 MCP Compatible📋 Structured Output🧠 Reasoning

Context Window

131,072

Input tokens

Full-context input ≈ $0.02

Max Output

32,768

Completion tokens

Input Price / 1M

$0.1170

Prompt tokens

Output Price / 1M

$1.37

Completion tokens

Top Benchmark

74.9

MMLU score — highest of MMLU, GPQA, MATH, HumanEval

Quality & Benchmarks

Evaluation scores for Qwen: Qwen3 VL 8B Thinking. The “Top Benchmark” shown above is the highest score across MMLU, GPQA, MATH & HumanEval.

Benchmark	Score	Rank	Source
GPQA	57.9	#76 of 125	artificial_analysis
MMLU	74.9	#72 of 127	artificial_analysis

Price History

Qwen: Qwen3 VL 8B Thinking Pricing Trend

Input / 1M tokens0.0%Output / 1M tokens0.0%

Current Input / 1M

$0.1170

Current Output / 1M

$1.36

Performance History

Qwen: Qwen3 VL 8B Thinking Speed Trend

Tokens/sec (higher is better)Latency (lower is better)

Current TPS

0.00

Current Latency

0ms

Uptime

100.0%

Side-by-Side Pricing Table

Usage Type	Price / 1M Tokens
Input (Prompt)	$0.1170
Output (Completion)	$1.37

Compare with Qwen2.5 72B Instruct Compare with Mistral: Mistral 7B Instruct v0.1 Compare with Google: Gemma 4 31B

Cost Calculator

Estimate monthly spend for Qwen: Qwen3 VL 8B Thinking based on your workload.

Input tokens / month

01.0B

Output tokens / month

0500M

Estimated Monthly Cost

$19

25M input + 12M output tokens

Same Workload on Other Models

Arcee AI: Trinity Large Preview (free)$0.00−$19 Free Models Router$0.00−$19 Google: Gemma 3 12B (free)$0.00−$19 Google: Gemma 3 27B (free)$0.00−$19

Cheaper Alternatives to Compare

Quick links for cost-down decisions before production rollout.

Qwen: Qwen3 VL 8B Thinking vs Arcee AI: Trinity Large Preview (free)Qwen: Qwen3 VL 8B Thinking vs Free Models Router Qwen: Qwen3 VL 8B Thinking vs Google: Gemma 3 12B (free)Qwen: Qwen3 VL 8B Thinking vs Google: Gemma 3 27B (free)

Benchmark

Score

Rank

Source

GPQA

57.9

#76 of 125

artificial_analysis

MMLU

74.9

#72 of 127

artificial_analysis

Usage Type

Price / 1M Tokens

Input (Prompt)

$0.1170

Output (Completion)

$1.37

Cost Calculator

Estimate monthly spend for Qwen: Qwen3 VL 8B Thinking based on your workload.

Input tokens / month

01.0B

Output tokens / month

0500M

Estimated Monthly Cost

$19

25M input + 12M output tokens