First order finite volume (FV1) models that use uniform grids are often used in computational engineering, but may become prohibitively costly to run on a fine resolution and/or large areas. To reduce these costs, FV1 models have adopted adaptive gridding or parallelisation on graphics processing units (GPU). FV1 models that combine adaptive gridding and parallelisation usually generate the adaptive grid on the central processing unit (CPU), yielding extra costs for data transfer between the CPU and the GPU. This paper presents a computational innovation that avoids these costs by enabling GPU resident adaptive gridding, based on the multiresolution analysis (MRA) of Haar wavelets (HWs). It combines the indexing of Z order curves, to ensure coalesced access of GPU memory, and a newly adopted Parallel Tree Traversal (PTT) that minimises warp divergence of GPU threads. The resulting GPU resident adaptive gridding method is presented as part of a parallelised, HWFV1 hydrodynamic model (GPU-HWFV1). The model's runtime performance is benchmarked against its CPU predecessor (CPU-HWFV1) and a GPU-FV1 uniform grid model for a range of test cases ran on the finest resolution grid accessible to the HWFV1 models. Tests demonstrate the robustness of the results. As for runtime performance, GPU-HWFV1 is up to 400x faster than CPU-HWFV1, while remaining 30x faster than GPU-FV1 especially in applications that require increased depth in the grid resolution and high sensitivity to resolution refinement. The findings are significant, making a strong case for applying the proposed GPU resident adaptive gridding method to further speed-up FV1 models.
翻译:使用统一网格的一级定序量模型(FV1)通常用于计算工程,但可能变得过于昂贵,无法在精密分辨率和(或)大区域运行。为了降低这些费用,FV1模型在图形处理器(GPU)采用了适应性网格或平行化。FV1模型将适应性网格和平行化相结合,通常会在中央处理器(CPU)上产生适应性网格,产生CPU和GPU之间数据传输的额外费用。本文展示了一种计算性能创新,避免了这些成本的敏感性,因为根据对Haar1 电波站的多分辨率分析(MRA)1 启动GV1 驻地适应性电网格。它结合了Z订单曲线的索引,以确保 GPU内存和平行的平行树型轨图,从而最大限度地缩小了GPUF1 的网络化模型(GFFFU-HF1) 运行时性能的精确性能性能,在SUFF1 特别是SLVA 的常规测试前列中,要求一个高度的硬性阵列的硬性阵列的硬性性阵列结果。