The Multilevel Monte Carlo (MLMC) method has proven to be an effective variance-reduction statistical method for Uncertainty Quantification (UQ) in Partial Differential Equation (PDE) models, combining model computations at different levels to create an accurate estimate. Still, the computational complexity of the resulting method is extremely high, particularly for 3D models, which requires advanced algorithms for the efficient exploitation of High Performance Computing (HPC). In this article we present a new implementation of the MLMC in massively parallel computer architectures, exploiting parallelism within and between each level of the hierarchy. The numerical approximation of the PDE is performed using the finite element method but the algorithm is quite general and could be applied to other discretization methods as well, although the focus is on parallel sampling. The two key ingredients of an efficient parallel implementation are a good processor partition scheme together with a good scheduling algorithm to assign work to different processors. We introduce a multiple partition of the set of processors that permits the simultaneous execution of different levels and we develop a dynamic scheduling algorithm to exploit it. The problem of finding the optimal scheduling of distributed tasks in a parallel computer is an NP-complete problem. We propose and analyze a new greedy scheduling algorithm to assign samples and we show that it is a 2-approximation, which is the best that may be expected under general assumptions. On top of this result we design a distributed memory implementation using the Message Passing Interface (MPI) standard. Finally we present a set of numerical experiments illustrating its scalability properties.
翻译:多层次蒙特卡洛(MLMC)方法已被证明是部分差异量化模型中不确定性量化模型(UQ)的有效减少差异统计方法,它综合了不同层次的模型计算,以得出准确的估计数。不过,由此产生的方法的计算复杂性极高,特别是3D模型。 3D模型要求为高效利用高性能计算器(HPC)提供高级算法,这要求为高效利用高性能计算器(HPC)进行高效利用。在文章中,我们介绍了在大规模平行的计算机结构中实施MLMC的新应用。PDE的数字近似值是使用有限要素方法进行的,但算法相当笼统,可以应用于其他离异性化方法,尽管重点是平行抽样。高效平行实施方法的两个主要要素是良好的处理器分割法,以及将工作分配给不同的处理器(HPC)。我们引入一套处理器的多重分隔法,允许同时执行不同级别,我们开发一套动态的定时算法来利用它。在目前采用最优化的内值推法时,在分配的内程中找到一个最佳的内程排序,我们最后在使用一种平行的模型上显示一个我们所分配的内程。