A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) \( x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) \), and showed that for appropriate choices of the state matrix \( A \), this system could handle long-range dependencies mathematically and empirically. However, this method has prohibitive computation and memory requirements, rendering it infeasible as a general sequence modeling solution. We propose the Structured State Space sequence model (S4) based on a new parameterization for the SSM, and show that it can be computed much more efficiently than prior approaches while preserving their theoretical strengths. Our technique involves conditioning \( A \) with a low-rank correction, allowing it to be diagonalized stably and reducing the SSM to the well-studied computation of a Cauchy kernel. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91\% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation $60\times$ faster (iii) SoTA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.
翻译:序列建模的核心目标是设计一个单一的原则模型,该模型可以处理一系列模式和任务的序列数据,特别是远程依赖性。虽然包括RNN、CNN和变异器在内的常规模型具有捕捉长期依赖性的专门变体,但它们仍然难以将长期依赖性缩小到1000美元或更多步骤的长序列。一个有希望的近期方法通过模拟基本状态空间模型(SSM)\(x'(t)=Ax(t)+Bu(t), y(t)=Cx(t)+Du(t)\),并显示,对于国家矩阵的适当选择(A\\),这个系统可以以数学和实验的方式处理长期依赖性的长期依赖性,但是,这个方法的计算和记忆要求令人窒息,使它无法作为一般序列建模解决方案。我们建议基于SSSSSM新参数的结构性国家空间序列模型(S4),并且表明,它可以比以前更高效地计算出比Cx(Cx-T)更精确的直径直径直径直径,同时,我们的技术需要不断调整一个稳定的S-直径直径直径直径直达的S-直径。