Stencil computations represent a very common class of nested loops in scientific and engineering applications. Exploiting vector units in modern CPUs is crucial to achieving peak performance. Previous vectorization approaches often consider the data space, in particular the innermost unit-strided loop. It leads to the well-known data alignment conflict problem that vector loads are overlapped due to the data sharing between continuous stencil computations. This paper proposes a novel temporal vectorization scheme for stencils. It vectorizes the stencil computation in the iteration space and assembles points with different time coordinates in one vector. The temporal vectorization leads to a small fixed number of vector reorganizations that is irrelevant to the vector length, stencil order, and dimension. Furthermore, it is also applicable to Gauss-Seidel stencils, whose vectorization is not well-studied. The effectiveness of the temporal vectorization is demonstrated by various Jacobi and Gauss-Seidel stencils.
翻译:Stencils 计算是科学和工程应用中非常常见的嵌套环类。 在现代 CPU 中开发矢量单位对于实现峰值性能至关重要。 先前的矢量化方法通常会考虑数据空间, 特别是最内部单位环。 它导致众所周知的数据对齐问题, 即由于连续的 Stencils 计算数据共享, 矢量负荷会重叠。 本文提出了一个新的时间向导方法。 它将迭代空间中的静态计算和在一个矢量中以不同时间坐标组合点进行矢量化。 时间向量化方法导致与矢量长度、 tencils 和尺寸无关的少量固定矢量重组。 此外, 它也适用于高斯- Seidel tencils, 其矢量化不是很好地研究。 时间向量化的有效性由各种 coupi 和 Gaus- Seidel stencils 演示。