利用新日way超级计算机模拟量子电路的终生优化 (Lifetime-based Optimization for Simulating Quantum Circuits on a New Sunway Supercomputer)

High-performance classical simulator for quantum circuits, in particular the tensor network contraction algorithm, has become an important tool for the validation of noisy quantum computing. In order to address the memory limitations, the slicing technique is used to reduce the tensor dimensions, but it could also lead to additional computation overhead that greatly slows down the overall performance. This paper proposes novel lifetime-based methods to reduce the slicing overhead and improve the computing efficiency, including, an interpretation method to deal with slicing overhead, an inplace slicing strategy to find the smallest slicing set and an adaptive tensor network contraction path refiner customized for Sunway architecture. Experiments show that in most cases the slicing overhead with our inplace slicing strategy would be less than the Cotengra , which is the most used graph path optimization software at present. Finally, the resulting simulation time is reduced to 89.1s for the Sycamore quantum processor RQC, with a sustainable single-precision performance of 308.6Pflops using over 41M cores to generate 1M correlated samples, which is more than 5 times performance improvement compared to 60.4 Pflops in 2021 Gordon Bell Prize work.

翻译：用于量子电路的高性能古典模拟器,特别是高压网络收缩算法,已成为验证噪声量计算的一个重要工具。为了解决内存限制问题,使用剪切技术来减少振幅尺寸,但也可能导致额外的计算间接费用,从而大大减慢总体性能。本文提出了新的终生方法,以减少剪切间接费用,提高计算效率,包括一种处理切片间接费用的解释方法,一种用来寻找最小切片套件的剪切片战略,以及一种为Sunway建筑定制的适应性拉子网络收缩路径精细化器。实验表明,在大多数情况下,我们用原切片战略剪切除的顶部将比目前最常用的图形路径优化软件Cotengra要少。最后,Sycamore量处理器RQC的模拟时间将减少到89.1秒,其可持续的单精度性性工作为308.6Pflops,使用超过41M核心来生成1M相关样品。与Gordan Plorma的性能改进超过5倍于Gordan Plorma 2021的5倍。