We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.
翻译:我们推出了 Ling 2.0,这是一个基于“每个激活都能提升推理能力”原则构建的系列化推理导向语言基础模型。该系列旨在统一的混合专家(MoE)范式下,从数百亿参数扩展至一万亿参数,并强调高稀疏性、跨尺度一致性以及由经验缩放定律指导的效率。该系列包含三个非思维(指令)模型——Ling-mini-2.0、Ling-flash-2.0 和 Ling-1T,总参数范围从 160 亿到 1 万亿,与密集模型相比,其激活计算效率最高可提升 7 倍。Ling 2.0 在模型架构、预训练、后训练和基础设施方面整合了协同创新:采用带 MTP 的高稀疏 MoE 以实现高效推理、推理导向的数据与训练中期的思维链激活、基于强化的微调(DFT、Evo-CoT),以及全尺度 FP8 训练与细粒度异构流水线。在万亿参数规模上,Ling-1T 在推理准确性与计算效率之间建立了新的帕累托前沿,证明稀疏激活在与推理目标恰当对齐时,能够实现可扩展且高效的人工智能。总体而言,Ling 2.0 为推进未来推理与思维模型(包括基于同一基础的 Ring 系列)提供了一个连贯、开放且高效的基础。