Discontinuous Galerkin (dG) methods on meshes consisting of polygonal/polyhedral (henceforth, collectively termed as \emph{polytopic}) elements have received considerable attention in recent years. Due to the physical frame basis functions used typically and the quadrature challenges involved, the matrix-assembly step for these methods is often computationally cumbersome. To address this important practical issue, this work proposes two parallel assembly implementation algorithms on CUDA-enabled graphics cards for the interior penalty dG method on polytopic meshes for various classes of linear PDE problems. We are concerned with both single GPU parallelization, as well as with implementation on distributed GPU nodes. The results included showcase almost linear scalability of the quadrature step with respect to the number of GPU-cores used since no communication is needed for the assembly step. In turn, this can justify the claim that polytopic dG methods can be implemented extremely efficiently, as any assembly computing time overhead compared to finite elements on `standard' simplicial or box-type meshes can be effectively circumvented by the proposed algorithms.
翻译:Galerkin (dG) 方法在介质上不连续的 Galerkin (dG) 方法, 包括多边形/ 波利希德( 其后统称为 emph{ polydicolate) 元素) 元素, 近些年来受到相当重视。 由于通常使用的物理框架基函数以及涉及的二次曲线挑战, 这些方法的矩阵组装步骤往往在计算上很麻烦。 为了解决这一重要的实际问题, 这项工作提出了两种平行的组装算法, 在 CUDA 驱动的图形卡上, 两种平行的组装算法, 用于对各种线性 PDE 问题的多位形间隔板的内部 DG 方法 。 我们既关注单一的 GPU 平行化, 也关注分布式 GPU 节点的实施 。 其结果包括显示四极阶梯步的几乎线性缩放缩放缩放, 因为组件不需要任何通信来进行组装。 反过来, 这可以证明 多位 dG 方法可以非常高效地应用, 因为任何组装高时压都可以通过拟议的算法有效绕过“ ” 。