MoE-DiffuSeq：利用稀疏注意力与专家混合增强长文档扩散模型 (MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts)

We present MoE-DiffuSeq, a mixture of experts based framework for enhancing diffusion models in long document generation. Existing diffusion based text generation models, such as DiffuSeq, suffer from high computational cost and memory overhead when applied to extended sequences. To address these challenges, MoE-DiffuSeq integrates sparse attention with a mixture of experts architecture, enabling efficient and scalable long sequence modeling. Our approach introduces a customized sparse attention mechanism designed to reduce computational complexity while preserving text quality and coherence. In addition, we incorporate a soft absorbing state within the diffusion process to accelerate sequence reconstruction and improve generation precision. Extensive experiments demonstrate that MoE-DiffuSeq significantly improves training efficiency and sampling speed compared to existing diffusion models. These advantages are particularly effective for long document scenarios, including scientific article generation, code repository modeling, and long form dialogue generation. Benchmark results further show that MoE-DiffuSeq improves efficiency, speed, accuracy, and expressiveness, advancing the practical applicability of diffusion models for high quality long form text generation.

翻译：本文提出MoE-DiffuSeq，一种基于专家混合的框架，旨在增强扩散模型在长文档生成中的性能。现有的基于扩散的文本生成模型（如DiffuSeq）在处理长序列时面临高昂的计算成本和内存开销。为应对这些挑战，MoE-DiffuSeq将稀疏注意力与专家混合架构相结合，实现了高效且可扩展的长序列建模。我们的方法引入了一种定制的稀疏注意力机制，旨在降低计算复杂度的同时保持文本质量和连贯性。此外，我们在扩散过程中引入了一个软吸收态，以加速序列重构并提高生成精度。大量实验表明，与现有扩散模型相比，MoE-DiffuSeq显著提升了训练效率和采样速度。这些优势在长文档场景下尤为突出，包括科学文章生成、代码仓库建模以及长篇幅对话生成。基准测试结果进一步显示，MoE-DiffuSeq在效率、速度、准确性和表达能力方面均有提升，推动了扩散模型在高质量长文本生成中的实际应用。