We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique tokens over Nemotron 2, followed by supervised fine tuning and large-scale RL on diverse environments. Nemotron 3 Nano achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass. It achieves up to 3.3x higher inference throughput than similarly-sized open models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507, while also being more accurate on popular benchmarks. Nemotron 3 Nano demonstrates enhanced agentic, reasoning, and chat abilities and supports context lengths up to 1M tokens. We release both our pretrained Nemotron 3 Nano 30B-A3B Base and post-trained Nemotron 3 Nano 30B-A3B checkpoints on Hugging Face.
翻译:我们提出了Nemotron 3 Nano 30B-A3B,一种混合专家(Mixture-of-Experts)的Mamba-Transformer混合语言模型。Nemotron 3 Nano在25万亿文本token上进行了预训练,其中包含超过3万亿相对于Nemotron 2而言全新的独特token,随后在不同环境中进行了监督微调和大规模强化学习。与上一代Nemotron 2 Nano相比,Nemotron 3 Nano在每次前向传播中激活的参数不到一半,却实现了更高的准确率。与GPT-OSS-20B和Qwen3-30B-A3B-Thinking-2507等规模相近的开源模型相比,其推理吞吐量最高可达3.3倍,同时在主流基准测试中也更为准确。Nemotron 3 Nano展现出增强的智能体能力、推理能力和对话能力,并支持高达100万token的上下文长度。我们已在Hugging Face上发布了预训练模型Nemotron 3 Nano 30B-A3B Base以及训练后模型Nemotron 3 Nano 30B-A3B的检查点。