State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it introduces a custom representation and algorithm that can be difficult to implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize such diagonal state space models. While it follows from classical results that almost all SSMs have an equivalent diagonal form, we show that the initialization is critical for performance. We explain why DSS works mathematically, by showing that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension. We also systematically describe various design choices in parameterizing and computing diagonal SSMs, and perform a controlled empirical study ablating the effects of these choices. Our final model S4D is a simple diagonal version of S4 whose kernel computation requires just 2 lines of code and performs comparably to S4 in almost all settings, with state-of-the-art results for image, audio, and medical time-series domains, and averaging 85\% on the Long Range Arena benchmark.
翻译:国家空间模型(SSM)最近被证明非常有效,因为它是一个深层次的学习层,是RNN、CNN或变异器等序列模型的一个很有希望的替代物。第一个显示这种潜力的版本是S4模型,该模型通过使用一个称为HIPPO 矩阵的指定状态矩阵,对远程依赖性任务特别有效。这个模型有一个用于模拟长期依赖性的可解释的数学机制,它引入了一种难以执行的自定义表达和算法。另一方面,一个称为DSS的最近变种显示,限制国家矩阵以完全对数化为模式,在使用基于对称 S4 4 矩阵矩阵的特定初始化模型时,仍然能够保持原始模型的性能。这项工作试图系统地了解如何对这种对差异性状态空间模型进行参数化和初始化。虽然根据传统结果,几乎所有SSSMM的初始化形式都具有等量的分数形式,但我们可以解释为什么DSS的初始化作用是数学,通过显示S4 4 级矩阵的对数值矩阵的对数值的精确度限制, 和SDQL 的精确的精确的计算结果也是在SDAR4 的SDL 上对SDL 的模型的精确度上对SDLLA 的模型的精确度的精确度的计算。