The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) or DeepPCR (arXiv:2309.16318) have shown that evaluating a state space model can be recast as solving a parallelizable optimization problem, and sometimes this approach can yield dramatic speed-ups in evaluation time. However, the factors that govern the difficulty of these optimization problems remain unclear, limiting the larger adoption of the technique. In this work, we establish a precise relationship between the dynamics of a nonlinear system and the conditioning of its corresponding optimization formulation. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior, impacts the number of optimization steps required for evaluation. In predictable systems, the state trajectory can be computed in $O((\log T)^2)$ time, where $T$ is the sequence length, a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis demonstrates that for predictable systems, the optimization problem is always well-conditioned, whereas for unpredictable systems, the conditioning degrades exponentially as a function of the sequence length. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized, and highlighting predictability as a key design principle for parallelizable models.
翻译:并行计算硬件的兴起使得理解哪些非线性状态空间模型能够被高效并行化变得日益重要。近期如DEER(arXiv:2309.12252)或DeepPCR(arXiv:2309.16318)等研究进展表明,评估一个状态空间模型可以转化为求解一个可并行化的优化问题,并且这种方法有时能在评估时间上带来显著的加速。然而,决定这些优化问题难度的因素尚不明确,这限制了该技术的更广泛采用。在本工作中,我们建立了非线性系统动力学与其对应优化问题条件数之间的精确关系。我们证明,一个系统的可预测性——定义为状态微小扰动对未来行为的影响程度——会影响评估所需的优化步数。在可预测系统中,状态轨迹可以在$O((\log T)^2)$时间内计算得出,其中$T$为序列长度,这相较于传统的顺序方法是一个重大改进。相反,混沌或不可预测的系统则表现出较差的条件数,导致并行评估收敛过慢而失去实用价值。重要的是,我们的理论分析表明,对于可预测系统,优化问题总是良态的;而对于不可预测系统,条件数会随序列长度呈指数级恶化。我们通过大量实验验证了我们的论断,为非线性动力系统何时能够被高效并行化提供了实用指导,并强调了可预测性作为可并行化模型的一个关键设计原则。