We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute budget. In long-context scenarios, LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented generation) and decode-intensive (e.g., tool-integrated reasoning) cases. Specifically, by applying LoZA to LongCat-Flash during mid-training, we serve LongCat-Flash-Exp as a long-context foundation model that can swiftly process up to 1 million tokens, enabling efficient long-term reasoning and long-horizon agentic capabilities.
翻译:本文介绍了 LongCat ZigZag 注意力机制(LoZA),这是一种稀疏注意力方案,旨在将任何现有的全注意力模型转化为计算预算有限的稀疏版本。在长上下文场景中,LoZA 能够在预填充密集型(例如,检索增强生成)和解码密集型(例如,工具集成推理)任务中实现显著的加速。具体而言,通过在中期训练阶段将 LoZA 应用于 LongCat-Flash,我们推出了 LongCat-Flash-Exp 作为长上下文基础模型,该模型能够快速处理高达 100 万个标记,从而实现高效的长程推理和长视野智能体能力。