Context Window
8,192
Input tokens
Full-context input ≈ $0.00
Model Cost Profile
Developer: nousresearch· Tokenizer: Llama3 · Instruct: chatml · Quantization: fp16
Canonical ID: nousresearch/hermes-2-pro-llama-3-8b
Pricing updated Apr 24, 2026
Live Pricing
Input: $0.1400
Output: $0.1400
Last synced Apr 24, 2026
NousResearch's Hermes 2 Pro - Llama-3 8B model offers a substantial context window of 8192 tokens, making it suitable for complex applications such as document summarization and conversational AI. With an input and output pricing of $0.14 per million tokens, teams can effectively manage costs while scaling their usage for projects that require extensive data processing. This model is ideal for businesses looking to integrate advanced language capabilities into their products without incurring prohibitive expenses.
Context Window
8,192
Input tokens
Full-context input ≈ $0.00
Max Output
8,192
Completion tokens
Input Price / 1M
$0.1400
Prompt tokens
Output Price / 1M
$0.1400
Completion tokens
Top Benchmark
Pending
No benchmark data yet
Price History
Current Input / 1M
$0.1400
Current Output / 1M
$0.1400
Performance History
Current TPS
0.00
Current Latency
0ms
Uptime
100.0%
| Usage Type | Price / 1M Tokens |
|---|---|
| Input (Prompt) | $0.1400 |
| Output (Completion) | $0.1400 |
Estimate monthly spend for NousResearch: Hermes 2 Pro - Llama-3 8B based on your workload.
Estimated Monthly Cost
$5.18
25M input + 12M output tokens
Quick links for cost-down decisions before production rollout.