Skip to content

Nemotron 3 Ultra

NVIDIALanguage modeling/generationQuestion answeringOpen weights

Developed by NVIDIA in 2026, Nemotron 3 Ultra is a language modeling/generation model with 550000000000.0 parameters with openly available weights.

About Nemotron 3 Ultra

We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context length to 1M tokens, an

LLM pricing & performance

Full LLM page →

Nemotron 3 Ultra is available via API — live cost, context, and benchmark data:

Input / 1M
$0.60
Output / 1M
$2.40
Context
1M
Tokens/sec

Details

Provider
NVIDIA
Task
Language modeling/generation,Question answering
Parameters
550000000000.0
Released
2026-06-04
Open weights
Yes
View model source

Explore

FAQ