High-performance classical simulator for quantum circuits, in particular the tensor network contraction algorithm, has become an important tool for the validation of noisy quantum computing. In order to address the memory limitations, the slicing technique is used to reduce the tensor dimensions, but it could also lead to additional computation overhead that greatly slows down the overall performance. This paper proposes novel lifetime-based methods to reduce the slicing overhead and improve the computing efficiency, including an interpretation method to deal with slicing overhead, an in-place slicing strategy to find the smallest slicing set and an adaptive tensor network contraction path refiner customized for Sunway architecture. Experiments show that in most cases the slicing overhead with our in-place slicing strategy would be less than the cotengra, which is the most used graph path optimization software at present. Finally, the resulting simulation time is reduced to 96.1s for the Sycamore quantum processor RQC, with a sustainable single-precision performance of 308.6Pflops using over 41M cores to generate 1M correlated samples, which is more than 5 times performance improvement compared to 60.4 Pflops in 2021 Gordon Bell Prize work.
翻译:高性能经典模拟器用于模拟量子电路,特别是张量网络缩并算法,已成为验证嘈杂量子计算的重要工具。为了解决内存限制,采用切片技术来减少张量维度,但这也可能导致额外的计算开销,从而大大降低总体性能。本文提出了新颖的基于寿命的方法来减少切片开销并提高计算效率,包括一种解释方法来处理切片开销,一种就地切片策略来找到最小切片集合以及一个适用于Sunway架构的自适应张量网络缩并路径调整器。实验表明,在大多数情况下,我们采用的就地切片策略的切片开销会比cotengra小,cotengra是目前最常用的图路径优化软件之一。最后,生成1百万个相关样本的Sycamore量子处理器RQC的模拟时间缩短到了96.1秒,单精度性能持续为308.6Pflops,使用超过41M个核心,这比2021年Gordon Bell Prize work中的60.4Pflops多了5倍以上的性能提高。