Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approach helps run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. However, the performance of the GPU-based out-of-core stencil computation is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, but the study on the use of on-the-fly compression techniques is far from sufficient. In this study, we propose a method that accelerates the GPU-based out-of-core stencil computation with on-the-fly compression. We introduce a novel data compression approach that solves the data dependency between two contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining that overlaps CPU/GPU data transfer with GPU computation. Experimental results show that the proposed method achieved a speedup of 1.2x compared the method without compression. Moreover, although the precision loss involved by compression increased with the number of time steps, the precision loss was trivial up to 4,320 time steps, demonstrating the usefulness of the proposed method.
翻译:Stencils 计算是一个重要的科学应用类别,可以由图形处理器(GPUs)高效率地执行。 核心外方法有助于运行大型的 Stencils 代码,处理比GPU内存容量大得多的数据。 然而,基于 GPU 的基于核心的外线性能计算总是受CPU和GPU之间数据传输的限制。 许多优化已经探索过以减少这种数据传输,但关于使用在飞中压缩技术的研究远远不够充分。在本研究中,我们建议采用一种方法,加速基于 GPU 的核心超线计算,在飞中压缩中处理规模大于GPU 内存有限容量的数据。但我们采用了一种新的数据压缩方法,解决两个相毗连的分解式数据区块之间的数据依赖性。我们还修改了一个广泛使用的基于 GPUPU的压缩库,以支持将CPU/GPU数据传输与GPU计算相重叠的管道衬托线。 实验结果显示,拟议的方法在不压缩的情况下实现了1.2x的加速。 此外,尽管压缩所需的精确性耗损是时间步骤,但压缩后,压缩的精度损失。