State Space Models (SSMs) have become the leading alternative to Transformers for sequence modeling. Their primary advantage is efficiency in long-context and long-form generation, enabled by fixed-size memory and linear scaling of computational complexity. We begin this work by showing a simple theoretical result stating that SSMs cannot accurately solve any ``truly long-form'' generation problem (in a sense we formally define), undermining their main competitive advantage. However, we show that this limitation can be mitigated by allowing SSMs interactive access to external tools. In fact, we show that given the right choice of tool access and problem-dependent training data, SSMs can learn to solve any tractable problem and generalize to arbitrary problem length/complexity (i.e., achieve length generalization). Following our theoretical finding, we demonstrate that tool-augmented SSMs achieve remarkable length generalization on a variety of arithmetic, reasoning, and coding tasks. These findings highlight SSMs as a potential efficient alternative to Transformers in interactive tool-based and agentic settings.
翻译:状态空间模型已成为序列建模中Transformer的主要替代方案。其核心优势在于长上下文和长序列生成的高效性,这得益于固定大小的内存和计算复杂度的线性扩展。本研究首先通过一个简单的理论结果证明:SSM无法精确解决任何"真正长序列"生成问题(此处我们给出正式定义),这削弱了其主要竞争优势。然而,我们发现通过允许SSM与外部工具进行交互访问,可以缓解这一限制。事实上,我们证明只要选择适当的工具访问方式和问题相关的训练数据,SSM能够学会解决任何可处理问题,并泛化到任意问题长度/复杂度(即实现长度泛化)。基于理论发现,我们进一步证明工具增强的SSM在算术、推理和编程等多种任务上实现了显著的长度泛化。这些发现凸显了SSM在交互式工具驱动和智能体场景中作为Transformer高效替代方案的潜力。