Model Cost Profile

NVIDIA: Llama 3.1 Nemotron 70B Instruct

Developer: nvidia· Tokenizer: Llama3 · Instruct: llama3 · Quantization: fp8

Canonical ID: nvidia/llama-3.1-nemotron-70b-instruct

Pricing updated Apr 22, 2026

Input rank: #257Output rank: #185

Live Pricing

Input: $1.20

Output: $1.20

Visit NVIDIA ↗HuggingFace ↗View full pricing leaderboard

Last synced Apr 22, 2026 · MMLU score via public benchmark data

The NVIDIA Llama 3.1 Nemotron 70B Instruct model offers a substantial context window of 131,072 tokens, making it ideal for applications requiring extensive text comprehension, such as legal document analysis or long-form content generation. With a competitive input and output pricing of $1.20 per 1 million tokens, teams can effectively manage costs while leveraging the model for high-volume tasks like customer support automation and data summarization. This model's advanced capabilities are particularly beneficial for organizations needing to process large datasets or maintain context over extended interactions.

🔧 Tool Calling🔌 MCP Compatible📋 Structured Output

Context Window

131,072

Input tokens

Full-context input ≈ $0.16

Max Output

16,384

Completion tokens

Input Price / 1M

$1.20

Prompt tokens

Output Price / 1M

$1.20

Completion tokens

Top Benchmark

69.0

MMLU score — highest of MMLU, GPQA, MATH, HumanEval

Quality & Benchmarks

Evaluation scores for NVIDIA: Llama 3.1 Nemotron 70B Instruct. The “Top Benchmark” shown above is the highest score across MMLU, GPQA, MATH & HumanEval.

Benchmark	Score	Rank	Source
GPQA	46.5	#106 of 125	artificial_analysis
MMLU	69.0	#98 of 127	artificial_analysis

Price History

NVIDIA: Llama 3.1 Nemotron 70B Instruct Pricing Trend

Input / 1M tokens0.0%Output / 1M tokens0.0%

Current Input / 1M

$1.20

Current Output / 1M

$1.20

Performance History

NVIDIA: Llama 3.1 Nemotron 70B Instruct Speed Trend

Tokens/sec (higher is better)Latency (lower is better)

Current TPS

0.00

Current Latency

0ms

Uptime

100.0%

Side-by-Side Pricing Table

Usage Type	Price / 1M Tokens
Input (Prompt)	$1.20
Output (Completion)	$1.20

Compare with NVIDIA: Nemotron Nano 12B 2 VL Compare with Z.ai: GLM 5 Turbo Compare with Z.ai: GLM 5V Turbo

Cost Calculator

Estimate monthly spend for NVIDIA: Llama 3.1 Nemotron 70B Instruct based on your workload.

Input tokens / month

01.0B

Output tokens / month

0500M

Estimated Monthly Cost

$44

25M input + 12M output tokens

Same Workload on Other Models

Arcee AI: Trinity Large Preview (free)$0.00−$44 Free Models Router$0.00−$44 Google: Gemma 3 12B (free)$0.00−$44 Google: Gemma 3 27B (free)$0.00−$44

Cheaper Alternatives to Compare

Quick links for cost-down decisions before production rollout.

NVIDIA: Llama 3.1 Nemotron 70B Instruct vs Arcee AI: Trinity Large Preview (free)NVIDIA: Llama 3.1 Nemotron 70B Instruct vs Free Models Router NVIDIA: Llama 3.1 Nemotron 70B Instruct vs Google: Gemma 3 12B (free)NVIDIA: Llama 3.1 Nemotron 70B Instruct vs Google: Gemma 3 27B (free)

Benchmark

Score

Rank

Source

GPQA

46.5

#106 of 125

artificial_analysis

MMLU

69.0

#98 of 127

artificial_analysis

Usage Type

Price / 1M Tokens

Input (Prompt)

$1.20

Output (Completion)

$1.20

Cost Calculator

Estimate monthly spend for NVIDIA: Llama 3.1 Nemotron 70B Instruct based on your workload.

Input tokens / month

01.0B

Output tokens / month

0500M

Estimated Monthly Cost

$44

25M input + 12M output tokens