As modern massively parallel clusters are getting larger with beefier compute nodes, traditional parallel eigensolvers, such as direct solvers, struggle keeping the pace with the hardware evolution and being able to scale efficiently due to additional layers of communication and synchronization. This difficulty is especially important when porting traditional libraries to heterogeneous computing architectures equipped with accelerators, such as Graphics Processing Unit (GPU). Recently, there have been significant scientific contributions to the development of filter-based subspace eigensolver to compute partial eigenspectrum. The simpler structure of these type of algorithms makes for them easier to avoid the communication and synchronization bottlenecks typical of direct solvers. The Chebyshev Accelerated Subspace Eigensolver (ChASE) is a modern subspace eigensolver to compute partial extremal eigenpairs of large-scale Hermitian eigenproblems with the acceleration of a filter based on Chebyshev polynomials. In this work, we extend our previous work on ChASE by adding support for distributed hybrid CPU-multi-GPU computing architectures. Our tests show that ChASE achieves very good scaling performance up to 144 nodes with 526 NVIDIA A100 GPUs in total on dense eigenproblems of size up to $360$k.
翻译:由于现代大规模平行的分类组合体随着较牛肉化的compute节点而日益扩大,传统平行的双质分解器(如直接解算器)等传统类型的分解器结构正在逐渐扩大,与硬件的进化步保持同步,并由于通信和同步层的增加而能够有效地扩大规模。当将传统图书馆移植到配有加速器的混合计算机结构(如图形处理股)时,这一困难尤其重要。最近,在科学上对开发基于过滤的子空间二分空(eigensspectrum)做出了重大贡献,以计算部分的成份。这类比较简单的算法结构使得它们更容易避免直接解答器典型的通信和同步瓶颈。Chebyshev加速的子空间 Eigensolver(CHASE)是一个现代的子解析器,可以将大规模Hermitian egenpairs 的局部偏差器与基于Chebyshev IPIPI的过滤器加速进行。在这项工作中,我们将我们以前关于CPU-M-PRO-GPI 5 的计算结构的分布性能显示在全CSAS的测试中,我们测试显示整个CH- mex-PIS-DIS-G-PI-SAS的SA的测试显示。