Model Cost Profile

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Developer: nvidia· Tokenizer: Llama3 · Quantization: fp8

Canonical ID: nvidia/llama-3.3-nemotron-super-49b-v1.5

Pricing updated Apr 24, 2026

Input rank: #91Output rank: #103

Live Pricing

Input: $0.1000

Output: $0.4000

Visit NVIDIA ↗HuggingFace ↗View full pricing leaderboard

Last synced Apr 24, 2026 · MMLU score via public benchmark data

The NVIDIA Llama 3.3 Nemotron Super 49B V1.5 model is designed for advanced natural language processing tasks, making it suitable for applications in chatbots, content generation, and data analysis. With an extensive context window of 131,072 tokens, teams can manage larger datasets and maintain context over longer conversations, enhancing user experience and accuracy. The pricing structure, at $0.10 per million tokens for input and $0.40 for output, allows organizations to budget effectively based on their specific usage needs and project scale.

🔧 Tool Calling🔌 MCP Compatible📋 Structured Output🧠 Reasoning

Context Window

131,072

Input tokens

Full-context input ≈ $0.01

Max Output

—

Not specified

Input Price / 1M

$0.1000

Prompt tokens

Output Price / 1M

$0.4000

Completion tokens

Top Benchmark

78.5

MMLU score — highest of MMLU, GPQA, MATH, HumanEval

Quality & Benchmarks

Evaluation scores for NVIDIA: Llama 3.3 Nemotron Super 49B V1.5. The “Top Benchmark” shown above is the highest score across MMLU, GPQA, MATH & HumanEval.

Benchmark	Score	Rank	Source
GPQA	64.3	#58 of 125	artificial_analysis
MMLU	78.5	#53 of 127	artificial_analysis

Price History

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 Pricing Trend

Input / 1M tokens0.0%Output / 1M tokens0.0%

Current Input / 1M

$0.1000

Current Output / 1M

$0.4000

Performance History

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 Speed Trend

Tokens/sec (higher is better)Latency (lower is better)

Current TPS

0.00

Current Latency

0ms

Uptime

100.0%

Side-by-Side Pricing Table

Usage Type	Price / 1M Tokens
Input (Prompt)	$0.1000
Output (Completion)	$0.4000

Compare with NVIDIA: Nemotron 3 Super Compare with ByteDance Seed: Seed-2.0-Mini Compare with ByteDance: UI-TARS 7B

Cost Calculator

Estimate monthly spend for NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 based on your workload.

Input tokens / month

01.0B

Output tokens / month

0500M

Estimated Monthly Cost

$7.30

25M input + 12M output tokens

Same Workload on Other Models

Baidu: Qianfan-OCR-Fast (free)$0.00−$7.30 Free Models Router$0.00−$7.30 Google: Gemma 3 12B (free)$0.00−$7.30 Google: Gemma 3 27B (free)$0.00−$7.30

Cheaper Alternatives to Compare

Quick links for cost-down decisions before production rollout.

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 vs Baidu: Qianfan-OCR-Fast (free)NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 vs Free Models Router NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 vs Google: Gemma 3 12B (free)NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 vs Google: Gemma 3 27B (free)

Benchmark

Score

Rank

Source

GPQA

64.3

#58 of 125

artificial_analysis

MMLU

78.5

#53 of 127

artificial_analysis

Usage Type

Price / 1M Tokens

Input (Prompt)

$0.1000

Output (Completion)

$0.4000

Cost Calculator

Estimate monthly spend for NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 based on your workload.

Input tokens / month

01.0B

Output tokens / month

0500M

Estimated Monthly Cost

$7.30

25M input + 12M output tokens