We consider the distributed training of large-scale neural networks that serve as PDE solvers producing full field outputs. We specifically consider neural solvers for the generalized 3D Poisson equation over megavoxel domains. A scalable framework is presented that integrates two distinct advances. First, we accelerate training a large model via a method analogous to the multigrid technique used in numerical linear algebra. Here, the network is trained using a hierarchy of increasing resolution inputs in sequence, analogous to the 'V', 'W', 'F', and 'Half-V' cycles used in multigrid approaches. In conjunction with the multi-grid approach, we implement a distributed deep learning framework which significantly reduces the time to solve. We show the scalability of this approach on both GPU (Azure VMs on Cloud) and CPU clusters (PSC Bridges2). This approach is deployed to train a generalized 3D Poisson solver that scales well to predict output full-field solutions up to the resolution of 512x512x512 for a high dimensional family of inputs.
翻译:我们考虑对大规模神经网络进行分布式培训,这些网络是作为PDE解算器的大规模神经网络,可以产生完整的实地产出。我们特别考虑对超大voxel域的3D Poisson等式进行全方位神经解析器。我们提出了一个可扩缩的框架,将两种不同的进步融合在一起。首先,我们通过类似于数值线性代数中多格技术的多格技术加速培训一个大型模型。在这里,对网络进行培训时采用了一个增加解析输入的层次,其顺序与多格方法中使用的“V”、“W”、“F”和“Half-V”周期相近。与多格方法一起,我们实施了一个分布式的深度学习框架,大大缩短了解决问题的时间。我们展示了GPU(云上Azure VMs)和CPU(PSC Bridges2)两种方法的可扩展性。这个方法用于培训一个通用的3D Poisson解算器,其规模足以预测输出全方位的解决方案,直至512x512x512x512的解决方案,用于高维的投入大家庭。