完全 GPU 居民GPU 以适应性电动波板为基础的适应性电网:应用于二维量定量流体动力学建模 (Fully GPU resident wavelet-based adaptive gridding: application to two-dimensional finite volume hydrodynamic modelling)

First order finite volume (FV1) models that use uniform grids are often used in computational engineering, but may become prohibitively costly to run on a fine resolution and/or large areas. To reduce these costs, FV1 models have adopted adaptive gridding or parallelisation on graphics processing units (GPU). FV1 models that combine adaptive gridding and parallelisation usually generate the adaptive grid on the central processing unit (CPU), yielding extra costs for data transfer between the CPU and the GPU. This paper presents a computational innovation that avoids these costs by enabling GPU resident adaptive gridding, based on the multiresolution analysis (MRA) of Haar wavelets (HWs). It combines the indexing of Z order curves, to ensure coalesced access of GPU memory, and a newly adopted Parallel Tree Traversal (PTT) that minimises warp divergence of GPU threads. The resulting GPU resident adaptive gridding method is presented as part of a parallelised, HWFV1 hydrodynamic model (GPU-HWFV1). The model's runtime performance is benchmarked against its CPU predecessor (CPU-HWFV1) and a GPU-FV1 uniform grid model for a range of test cases ran on the finest resolution grid accessible to the HWFV1 models. Tests demonstrate the robustness of the results. As for runtime performance, GPU-HWFV1 is up to 400x faster than CPU-HWFV1, while remaining 30x faster than GPU-FV1 especially in applications that require increased depth in the grid resolution and high sensitivity to resolution refinement. The findings are significant, making a strong case for applying the proposed GPU resident adaptive gridding method to further speed-up FV1 models.

翻译：使用统一网格的一级定序量模型(FV1)通常用于计算工程,但可能变得过于昂贵,无法在精密分辨率和(或)大区域运行。为了降低这些费用,FV1模型在图形处理器(GPU)采用了适应性网格或平行化。FV1模型将适应性网格和平行化相结合,通常会在中央处理器(CPU)上产生适应性网格,产生CPU和GPU之间数据传输的额外费用。本文展示了一种计算性能创新,避免了这些成本的敏感性,因为根据对Haar1 电波站的多分辨率分析(MRA)1 启动GV1 驻地适应性电网格。它结合了Z订单曲线的索引,以确保 GPU内存和平行的平行树型轨图,从而最大限度地缩小了GPUF1 的网络化模型(GFFFU-HF1) 运行时性能的精确性能性能,在SUFF1 特别是SLVA 的常规测试前列中,要求一个高度的硬性阵列的硬性阵列的硬性性阵列结果。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日