Context Window
131,072
Input tokens
Full-context input ≈ $0.01
Model Cost Profile
Developer: nvidia· Tokenizer: Llama3 · Quantization: fp8
Canonical ID: nvidia/llama-3.3-nemotron-super-49b-v1.5
Pricing updated Apr 24, 2026
Live Pricing
Input: $0.1000
Output: $0.4000
Last synced Apr 24, 2026 · MMLU score via public benchmark data
The NVIDIA Llama 3.3 Nemotron Super 49B V1.5 model is designed for advanced natural language processing tasks, making it suitable for applications in chatbots, content generation, and data analysis. With an extensive context window of 131,072 tokens, teams can manage larger datasets and maintain context over longer conversations, enhancing user experience and accuracy. The pricing structure, at $0.10 per million tokens for input and $0.40 for output, allows organizations to budget effectively based on their specific usage needs and project scale.
Context Window
131,072
Input tokens
Full-context input ≈ $0.01
Max Output
—
Not specified
Input Price / 1M
$0.1000
Prompt tokens
Output Price / 1M
$0.4000
Completion tokens
Top Benchmark
78.5
MMLU score — highest of MMLU, GPQA, MATH, HumanEval
Evaluation scores for NVIDIA: Llama 3.3 Nemotron Super 49B V1.5. The “Top Benchmark” shown above is the highest score across MMLU, GPQA, MATH & HumanEval.
Price History
Current Input / 1M
$0.1000
Current Output / 1M
$0.4000
Performance History
Current TPS
0.00
Current Latency
0ms
Uptime
100.0%
| Usage Type | Price / 1M Tokens |
|---|---|
| Input (Prompt) | $0.1000 |
| Output (Completion) | $0.4000 |
Estimate monthly spend for NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 based on your workload.
Estimated Monthly Cost
$7.30
25M input + 12M output tokens
Quick links for cost-down decisions before production rollout.