Context Window
131,072
Input tokens
Full-context input ≈ $0.16
Model Cost Profile
Developer: nvidia· Tokenizer: Llama3 · Instruct: llama3 · Quantization: fp8
Canonical ID: nvidia/llama-3.1-nemotron-70b-instruct
Pricing updated Apr 22, 2026
Live Pricing
Input: $1.20
Output: $1.20
Last synced Apr 22, 2026 · MMLU score via public benchmark data
The NVIDIA Llama 3.1 Nemotron 70B Instruct model offers a substantial context window of 131,072 tokens, making it ideal for applications requiring extensive text comprehension, such as legal document analysis or long-form content generation. With a competitive input and output pricing of $1.20 per 1 million tokens, teams can effectively manage costs while leveraging the model for high-volume tasks like customer support automation and data summarization. This model's advanced capabilities are particularly beneficial for organizations needing to process large datasets or maintain context over extended interactions.
Context Window
131,072
Input tokens
Full-context input ≈ $0.16
Max Output
16,384
Completion tokens
Input Price / 1M
$1.20
Prompt tokens
Output Price / 1M
$1.20
Completion tokens
Top Benchmark
69.0
MMLU score — highest of MMLU, GPQA, MATH, HumanEval
Evaluation scores for NVIDIA: Llama 3.1 Nemotron 70B Instruct. The “Top Benchmark” shown above is the highest score across MMLU, GPQA, MATH & HumanEval.
Price History
Current Input / 1M
$1.20
Current Output / 1M
$1.20
Performance History
Current TPS
0.00
Current Latency
0ms
Uptime
100.0%
| Usage Type | Price / 1M Tokens |
|---|---|
| Input (Prompt) | $1.20 |
| Output (Completion) | $1.20 |
Estimate monthly spend for NVIDIA: Llama 3.1 Nemotron 70B Instruct based on your workload.
Estimated Monthly Cost
$44
25M input + 12M output tokens
Quick links for cost-down decisions before production rollout.