Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4). A core component of S4 involves initializing the SSM state matrix to a particular matrix called a HiPPO matrix, which was empirically important for S4's ability to handle long sequences. However, the specific matrix that S4 uses was actually derived in previous work for a particular time-varying dynamical system, and the use of this matrix as a time-invariant SSM had no known mathematical interpretation. Consequently, the theoretical mechanism by which S4 models long-range dependencies actually remains unexplained. We derive a more general and intuitive formulation of the HiPPO framework, which provides a simple mathematical interpretation of S4 as a decomposition onto exponentially-warped Legendre polynomials, explaining its ability to capture long dependencies. Our generalization introduces a theoretically rich class of SSMs that also lets us derive more intuitive S4 variants for other bases such as the Fourier basis, and explains other aspects of training S4, such as how to initialize the important timescale parameter. These insights improve S4's performance to 86% on the Long Range Arena benchmark, with 96% on the most difficult Path-X task.
翻译:SSM是来自工程和统计的经典模型,最近显示,SSM在通过结构化国家空间序列模型(S4)进行机器学习方面很有希望。 S4的核心组成部分涉及将SSM国家矩阵初始化为一个称为HIPPO矩阵的特定矩阵,这对S4处理长序列的能力具有经验上的重要性。然而,S4在以往工作中实际使用的S4具体矩阵是用于某个特定时间分布式动态系统的典型模型,而将SSSM用作时间变化型SSSSM的理论性能丰富类别没有已知的数学解释。因此,S4模型远距离依赖性关系的理论机制实际上仍然无法解释。我们从SSSSMMT中得出了一个更加笼统和直观的模型,而HIPPO框架则提供了一种简单的数学解释,作为S4处理长序列的能力的解析。我们的一般化引入了一种理论上丰富的SSSSSSMMF, 也让我们从其他基础上得出更直观的S4变量。我们从其他基础上得出更多直观的S4变量,例如四号SARC级模型,解释如何将SARE ASyalalalal 4 等等重要级的精确度测距。