Galerkin 线性时变动态系统模型减法的计算式配方 (A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems)

from arxiv, 22 pages, 3 pages of supplementary material, 7 figures, minor changes to all FOM performance plots because some cases in fig2 were originally wrongly scaled

This work aims to advance computational methods for projection-based reduced order models (ROMs) of linear time-invariant (LTI) dynamical systems. For such systems, current practice relies on ROM formulations expressing the state as a rank-1 tensor (i.e., a vector), leading to computational kernels that are memory bandwidth bound and, therefore, ill-suited for scalable performance on modern many-core and hybrid computing nodes. This weakness can be particularly limiting when tackling many-query studies, where one needs to run a large number of simulations. This work introduces a reformulation, called rank-2 Galerkin, of the Galerkin ROM for LTI dynamical systems which converts the nature of the ROM problem from memory bandwidth to compute bound. We present the details of the formulation and its implementation, and demonstrate its utility through numerical experiments using, as a test case, the simulation of elastic seismic shear waves in an axisymmetric domain. We quantify and analyze performance and scaling results for varying numbers of threads and problem sizes. Finally, we present an end-to-end demonstration of using the rank-2 Galerkin ROM for a Monte Carlo sampling study. We show that the rank-2 Galerkin ROM is one order of magnitude more efficient than the rank-1 Galerkin ROM (the current practice) and about 970X more efficient than the full order model, while maintaining excellent accuracy in both the mean and statistics of the field.

翻译：这项工作旨在推进基于投影的线性时差动态系统的缩小定序模型(ROMs)的计算方法。对于这些系统,目前的做法依赖于将状态表达为一至一强(即矢量)级(即矢量)的ROM配方,从而导致内存带宽的计算内核,因此不适合现代多核心和混合计算节点上可缩放的性能。在处理许多需要进行大量模拟的阵列研究时,这一弱点可能特别有限。对于LTI动态系统来说,这项工作采用了名为Ser-2 Galerkin的Galerkin ROM的重新校正性,将ROM问题的性质从记忆带带宽转换到宽度约束。我们介绍了该配方及其实施的细节,并通过数字实验,在测距域域范围内模拟弹性地震剪移波的模型时,我们用数量不同的线条线和问题大小来量化和分析性能和测量结果。最后,我们用一个更高效的Gal-2级的实地数据演示,而我们用一个更高效的Gal-2级,我们用更高级的Gal-2级的Seral级的Sir 的实地的Serma 演示,而我们用一个高级的Gal-II的Gal-Siral-Sir 的Siral-Siral-Siral-Sir 级的Siral-Siral 级的Sir 的实地标的完整的实地标的完整的完整的完整的升级的实地标的完整的实地标的实地标的完整。