翻译后的标题： (Techniques, Tricks and Algorithms for Efficient GPU-Based Processing of Higher Order Hyperbolic PDEs)

GPU computing is expected to play an integral part in all modern Exascale supercomputers. It is also expected that higher order Godunov schemes will make up about a significant fraction of the application mix on such supercomputers. It is, therefore, very important to prepare the community of users of higher order schemes for hyperbolic PDEs for this emerging opportunity. We focus on three broad and high-impact areas where higher order Godunov schemes are used. The first area is computational fluid dynamics (CFD). The second is computational magnetohydrodynamics (MHD) which has an involution constraint that has to be mimetically preserved. The third is computational electrodynamics (CED) which has involution constraints and also extremely stiff source terms. Together, these three diverse uses of higher order Godunov methodology, cover many of the most important applications areas. In all three cases, we show that the optimal use of algorithms, techniques and tricks, along with the use of OpenACC, yields superlative speedups on GPUs! As a bonus, we find a most remarkable and desirable result: some higher order schemes, with their larger operations count per zone, show better speedup than lower order schemes on GPUs. In other words, the GPU is an optimal stratagem for overcoming the higher computational complexities of higher order schemes! Several avenues for future improvement have also been identified. A scalability study is presented for a real-world application using GPUs and comparable numbers of high-end multicore CPUs. It is found that GPUs offer a substantial performance benefit over comparable number of CPUs, especially when all the methods designed in this paper are used.

翻译：高阶拟超双曲型偏微分方程有效 GPU 处理的技术、策略和算法翻译后的摘要： GPU 计算预计将成为所有现代百亿亿次超级计算机中不可或缺的一部分。我们重点研究三个广泛且高影响力的领域中，高阶 Godunov 模式的使用情况。第一个领域是计算流体力学 (CFD)。第二个领域是计算磁流体力学 (MHD)，需要模拟保持拟共轭。第三个领域是计算电动力学 (CED)，它具有拟共轭约束，同时还具有极端硬源项。在这三种不同的高阶 Godunov 方法的使用案例中，展示了算法、技巧和诀窍的最佳应用，结合使用 OpenACC，可以在 GPU 上实现超凡的加速效果。我们发现一个最显著且令人期望的结果：某些高阶方案，由于其更大的区域操作计数，比 GPU 上的低阶方案具有更好的加速效果。换句话说，GPU 是克服高阶方案更高计算复杂度的最佳策略！还多次指出了未来进一步改进的几条途径。提供了一个真实应用程序的可伸缩性研究，该程序使用 GPU 和同样数量的高端多核 CPU 进行了比较。我们发现，GPU 比可比数量的 CPU 提供了相当大的性能优势，尤其是在使用本文中设计的所有方法时。