GPU 高质动能流模拟优化 (GPU Optimization for High-Quality Kinetic Fluid Simulation)

Fluid simulations are often performed using the incompressible Navier-Stokes equations (INSE), leading to sparse linear systems which are difficult to solve efficiently in parallel. Recently, kinetic methods based on the adaptive-central-moment multiple-relaxation-time (ACM-MRT) model have demonstrated impressive capabilities to simulate both laminar and turbulent flows, with quality matching or surpassing that of state-of-the-art INSE solvers. Furthermore, due to its local formulation, this method presents the opportunity for highly scalable implementations on parallel systems such as GPUs. However, an efficient ACM-MRT-based kinetic solver needs to overcome a number of computational challenges, especially when dealing with complex solids inside the fluid domain. In this paper, we present multiple novel GPU optimization techniques to efficiently implement high-quality ACM-MRT-based kinetic fluid simulations in domains containing complex solids. Our techniques include a new communication-efficient data layout, a load-balanced immersed-boundary method, a multi-kernel launch method using a simplified formulation of ACM-MRT calculations to enable greater parallelism, and the integration of these techniques into a parametric cost model to enable automated parameter search to achieve optimal execution performance. We also extended our method to multi-GPU systems to enable large-scale simulations. To demonstrate the state-of-the-art performance and high visual quality of our solver, we present extensive experimental results and comparisons to other solvers.

翻译：流体模拟往往使用不压缩的纳维-斯托克方程式(INSE)进行,从而导致难以同时有效解决的线性系统稀少。最近,基于适应-中中移动多放松时间(ACM-MRT)模型的动能方法展示了令人印象深刻的能力,以模拟云流和动荡流,其质量匹配或超过基于最先进的INSE解算器。此外,由于该方法的本地配制,它为在诸如GPU等平行系统上高度可伸缩的实施提供了机会。然而,高效的ACM-MRT基于直观的动态求解器需要克服若干计算挑战,特别是在处理流域内复杂的固体时。在本文件中,我们提出了多种新型的GPU优化技术,以便在含有复杂固体的域中高效地应用高质量的ACM-MRT级动态流体模拟流体。我们的技术包括新的通信效率数据布局,一个高度平衡的内嵌式解方法,一个基于大量直径直径的直径解方法,一个多核心的启动方法,以克服一系列的计算方法来应对若干计算挑战,特别是在处理液态的系统内。我们简化的自动化的大规模实验性磁测算法中,并实现我们目前最优化的模型的模型的模型的模型的模型,以及更细化的模型化的模型的模拟性能化的模拟性能方法,以及更细化的模型,以及更细化的模拟性能性能性能化的模拟性能性化的模型。我们制。