In [Van Beeumen, et. al, HPC Asia 2020, https://www.doi.org/10.1145/3368474.3368497] a scalable and matrix-free eigensolver was proposed for studying the many-body localization (MBL) transition of two-level quantum spin chain models with nearest-neighbor $XX+YY$ interactions plus $Z$ terms. This type of problem is computationally challenging because the vector space dimension grows exponentially with the physical system size, and averaging over different configurations of the random disorder is needed to obtain relevant statistical behavior. For each eigenvalue problem, eigenvalues from different regions of the spectrum and their corresponding eigenvectors need to be computed. Traditionally, the interior eigenstates for a single eigenvalue problem are computed via the shift-and-invert Lanczos algorithm. Due to the extremely high memory footprint of the LU factorizations, this technique is not well suited for large number of spins $L$, e.g., one needs thousands of compute nodes on modern high performance computing infrastructures to go beyond $L = 24$. The matrix-free approach does not suffer from this memory bottleneck, however, its scalability is limited by a computation and communication imbalance. We present a few strategies to reduce this imbalance and to significantly enhance the scalability of the matrix-free eigensolver. To optimize the communication performance, we leverage the consistent space runtime, CSPACER, and show its efficiency in accelerating the MBL irregular communication patterns at scale compared to optimized MPI non-blocking two-sided and one-sided RMA implementation variants. The efficiency and effectiveness of the proposed algorithm is demonstrated by computing eigenstates on a massively parallel many-core high performance computer.
翻译:在[Van Beeumen, et al., HPC Asia 2020, https://www.doi. 10.1145/33684743368497]中,建议使用一个可缩放和无基的egensool 来研究两层量子旋转链模型的多体本地化(MBL)过渡,该模型使用近邻的美元+YYY美元互动和Z美元。这种类型的问题在计算上具有挑战性,因为矢量空间的尺寸随物理系统规模的大小而成倍增长,而要获得相关的统计行为,就需要在随机紊乱的不同配置中平均使用。对于每个频谱区域及其对应的eigen值问题,都需要一个可缩放和不设基值的速率。传统上,用于单层量量量量量量量量量的量量子旋转链(MBMB)的内基质旋转链式模型通过变换换码计算。由于LU系数的记忆性极高,这种技术不适于大量调用美元,例如, 运行一个不平流, 需要数千个平流的平流的通信的平流的平流, 而不是平流的平流法化的平流化的平流法化,在现代平流法化的计算法化的运行。