状态空间模型的表达能力：形式语言视角 (The Expressive Capacity of State Space Models: A Formal Language Perspective)

Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, which could provide useful guidance to the search for better LM architectures. We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs. We find that SSMs and transformers have overlapping but distinct strengths. In star-free state tracking, SSMs implement straightforward and exact solutions to problems that transformers struggle to represent exactly. They can also model bounded hierarchical structure with optimal memory even without simulating a stack. On the other hand, we identify a design choice in current SSMs that limits their expressive power. We discuss implications for SSM and LM research, and verify results empirically on a recent SSM, Mamba.

翻译：近年来，基于线性状态空间模型（SSMs）的循环模型在语言建模（LM）中展现出与Transformer相竞争的性能。然而，对于此类模型在原理上的能力仍缺乏深入理解，而这可为探索更优的LM架构提供重要指导。本文从理论层面系统研究了SSMs的表达能力，并与Transformer及传统RNNs进行了对比分析。研究发现，SSMs与Transformer在能力上存在重叠但各有侧重。在无星号状态追踪任务中，SSMs能够直接且精确地解决Transformer难以准确表征的问题。即使无需模拟栈结构，SSMs也能以最优内存效率建模有界层次结构。另一方面，我们指出当前SSMs的设计选择限制了其表达能力的上限。本文讨论了该发现对SSM及LM研究的启示，并在近期提出的SSM模型Mamba上进行了实证验证。