We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance. Nano is released together with its technical report and this white paper, while Super and Ultra will follow in the coming months. We will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights.
翻译:我们介绍了Nemotron 3模型系列——Nano、Super和Ultra。这些模型具备强大的智能体、推理和对话能力。Nemotron 3系列采用混合专家(Mixture-of-Experts)的Mamba-Transformer混合架构,提供业界领先的吞吐量,并支持高达100万token的上下文长度。Super和Ultra模型采用NVFP4进行训练,并整合了LatentMoE——一种提升模型质量的新颖方法。这两个较大模型还包含MTP层,以实现更快的文本生成。所有Nemotron 3模型均通过多环境强化学习进行后训练,从而具备推理、多步骤工具使用能力,并支持细粒度的推理预算控制。最小的Nano模型在保持极高推理成本效益的同时,其准确性优于同类模型。Super模型针对协作智能体和高吞吐量工作负载(如IT工单自动化)进行了优化。最大的Ultra模型则提供了最先进的准确性和推理性能。Nano模型与其技术报告及本白皮书一同发布,而Super和Ultra模型将在未来几个月内陆续推出。我们将公开发布模型权重、训练前与训练后软件、配置方案以及我们拥有再分发权的所有数据。