An out-of-core stencil computation code handles large data whose size is beyond the capacity of GPU memory. Whereas, such an code requires streaming data to and from the GPU frequently. As a result, data movement between the CPU and GPU usually limits the performance. In this work, compression-based optimizations are proposed. First, an on-the-fly compression technique is applied to an out-of-core stencil code, reducing the CPU-GPU memory copy. Secondly, a single working buffer technique is used to reduce GPU memory consumption. Experimental results show that the stencil code using the proposed techniques achieved 1.1x speed and reduced GPU memory consumption by 33.0\% on an NVIDIA Tesla V100 GPU.
翻译:校外电极计算代码处理超过 GPU 内存容量的大型数据。 而这样的代码需要经常从 GPU 流数据。 因此, CPU 和 GPU 之间的数据移动通常会限制性能。 在此工作中, 提议进行压缩优化。 首先, 将现场压缩技术应用到核心电极代码, 减少 CPU- GPU 内存副本。 其次, 使用单一的工作缓冲技术来减少 GPU 内存的消耗。 实验结果表明, 使用拟议技术的 Stencils 代码实现了1.1x 速度, 并将 NVIDIA Tesla V100 GPU 的GPU内存消耗量减少了33.0 ⁇ 。