Model Cost Profile

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

Developer: nvidia· Tokenizer: Llama3

Pricing updated Mar 22, 2026

Input rank: #221Output rank: #216

Live Pricing

Input: $0.6000

Output: $1.80

Visit NVIDIA ↗View full pricing leaderboard

Last synced Mar 22, 2026 · MMLU score via public benchmark data

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

📋 Structured Output🧠 Reasoning

Context Window

131,072

Input tokens

Full-context input ≈ $0.08

Max Output

—

Not specified

Input Price / 1M

$0.6000

Prompt tokens

Output Price / 1M

$1.80

Completion tokens

Top Benchmark

82.5

MMLU score — highest of MMLU, GPQA, MATH, HumanEval

Quality & Benchmarks

Evaluation scores for NVIDIA: Llama 3.1 Nemotron Ultra 253B v1. The “Top Benchmark” shown above is the highest score across MMLU, GPQA, MATH & HumanEval.

Benchmark	Score	Rank	Source
GPQA	72.8	#44 of 126	artificial_analysis
MMLU	82.5	#29 of 128	artificial_analysis

Price History

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 Pricing Trend

Not enough data yet. Price tracking started recently — check back in a few days.

Performance History

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 Speed Trend

Not enough data yet. Performance tracking started recently — check back in a few days.

Side-by-Side Pricing Table

Usage Type	Price / 1M Tokens
Input (Prompt)	$0.6000
Output (Completion)	$1.80

Compare with NVIDIA: Nemotron Nano 12B 2 VL Compare with OpenAI: GPT Audio Mini Compare with Writer: Palmyra X5

Cost Calculator

Estimate monthly spend for NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 based on your workload.

Input tokens / month

01.0B

Output tokens / month

0500M

Estimated Monthly Cost

$37

25M input + 12M output tokens

Same Workload on Other Models

Arcee AI: Trinity Large Preview (free)$0.00−$37 Arcee AI: Trinity Mini (free)$0.00−$37 Free Models Router$0.00−$37 Google: Gemma 3 12B (free)$0.00−$37

Cheaper Alternatives to Compare

Quick links for cost-down decisions before production rollout.

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 vs Arcee AI: Trinity Large Preview (free)NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 vs Arcee AI: Trinity Mini (free)NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 vs Free Models Router NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 vs Google: Gemma 3 12B (free)

Benchmark

Score

Rank

Source

GPQA

72.8

#44 of 126

artificial_analysis

MMLU