Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping algorithms, cache-oblivious divide-and-conquer trapezoidal algorithms, and Krylov subspace methods. In this paper, we present two efficient parallel algorithms for performing linear stencil computations. Current direct solvers in this domain are computationally inefficient, and Krylov methods require manual labor and mathematical training. We solve these problems for linear stencils by using DFT preconditioning on a Krylov method to achieve a direct solver which is both fast and general. Indeed, while all currently available algorithms for solving general linear stencils perform $\Theta(NT)$ work, where $N$ is the size of the spatial grid and $T$ is the number of timesteps, our algorithms perform $o(NT)$ work. To the best of our knowledge, we give the first algorithms that use fast Fourier transforms to compute final grid data by evolving the initial data for many timesteps at once. Our algorithms handle both periodic and aperiodic boundary conditions, and achieve polynomially better performance bounds (i.e., computational complexity and parallel runtime) than all other existing solutions. Initial experimental results show that implementations of our algorithms that evolve grids of roughly $10^7$ cells for around $10^5$ timesteps run orders of magnitude faster than state-of-the-art implementations for periodic stencil problems, and 1.3$\times$ to 8.5$\times$ faster for aperiodic stencil problems.
翻译:Stencils 计算被广泛用于模拟多维电网多个时间步中物理系统状态的变化。 这个区域中最先进的技术分为三类: 缓冲- 敏化滚动算法、 缓冲- 隐蔽- 隐蔽- 分解- 解剖- 解剖陷阱算法和 Krylov 子空间方法。 在本文中, 我们为进行线性电流计算提供了两种高效的平行算法。 目前, 这个领域的直接解算器在计算上效率不高, Krylov 单元格的方法需要人工劳动和数学培训。 我们通过在 Krylov 方法上使用 DFT 设定快速和一般的直线性循环算法, 从而解决线性系统状况问题。 事实上, 目前所有用于解决一般线性电流电网的算法都是 $@ta( NT) 工作, 其中$是空间电网的大小, $T$是时间步数, 我们的计算方法需要人工- 美元。 根据我们的知识, 我们第一次在直线线型电路速度5 快速递算法上, 快速的递变换数据系统运行运行一个运行运行运行运行运行运行运行运行运行的运行运行运行运行一个周期的轨道。