可预测性实现非线性状态空间模型的并行化 (Predictability Enables Parallelization of Nonlinear State Space Models)

The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) or DeepPCR (arXiv:2309.16318) have shown that evaluating a state space model can be recast as solving a parallelizable optimization problem, and sometimes this approach can yield dramatic speed-ups in evaluation time. However, the factors that govern the difficulty of these optimization problems remain unclear, limiting the larger adoption of the technique. In this work, we establish a precise relationship between the dynamics of a nonlinear system and the conditioning of its corresponding optimization formulation. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior, impacts the number of optimization steps required for evaluation. In predictable systems, the state trajectory can be computed in $O((\log T)^2)$ time, where $T$ is the sequence length, a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis demonstrates that for predictable systems, the optimization problem is always well-conditioned, whereas for unpredictable systems, the conditioning degrades exponentially as a function of the sequence length. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized, and highlighting predictability as a key design principle for parallelizable models.

翻译：并行计算硬件的兴起使得理解哪些非线性状态空间模型能够被高效并行化变得日益重要。近期如DEER（arXiv:2309.12252）或DeepPCR（arXiv:2309.16318）等研究进展表明，评估一个状态空间模型可以转化为求解一个可并行化的优化问题，并且这种方法有时能在评估时间上带来显著的加速。然而，决定这些优化问题难度的因素尚不明确，这限制了该技术的更广泛采用。在本工作中，我们建立了非线性系统动力学与其对应优化问题条件数之间的精确关系。我们证明，一个系统的可预测性——定义为状态微小扰动对未来行为的影响程度——会影响评估所需的优化步数。在可预测系统中，状态轨迹可以在$O((\log T)^2)$时间内计算得出，其中$T$为序列长度，这相较于传统的顺序方法是一个重大改进。相反，混沌或不可预测的系统则表现出较差的条件数，导致并行评估收敛过慢而失去实用价值。重要的是，我们的理论分析表明，对于可预测系统，优化问题总是良态的；而对于不可预测系统，条件数会随序列长度呈指数级恶化。我们通过大量实验验证了我们的论断，为非线性动力系统何时能够被高效并行化提供了实用指导，并强调了可预测性作为可并行化模型的一个关键设计原则。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日