Faster classical simulation becomes essential for the validation of quantum computer, and tensor network contraction is a widely-applied simulation approach. Due to the memory limitation, slicing is adopted to help cutting down the memory size by reducing the tensor dimension, which also leads to additional computation overhead. This paper proposes novel lifetime-based methods to reduce the slicing overhead and improve the computing efficiency, including: interpretation for slicing overhead, an in place slicing strategy to find the smallest slicing set, a corresponding iterative method, and an adaptive path refiner customized for Sunway architecture. Experiments show that our in place slicing strategy reduces the slicing overhead to less than 1.2 and obtains 100-200 times speedups over related efforts. The resulting simulation time is reduced from 304s (2021 Gordon Bell Prize) to 149.2s on Sycamore RQC, with a sustainable mixed-precision performance of 416.5 Pflops using over 41M cores to simulate 1M correlated samples.
翻译:快速古典模拟对于量子计算机的验证至关重要, 高压网络收缩是一种广泛应用的模拟方法。 由于内存限制, 采用切片法帮助通过降低振幅尺寸来缩小内存大小, 从而导致额外的计算间接费用。 本文提出了新的终生方法, 以减少剪切间接费用和提高计算效率, 包括: 剪切间接费用的判读、 找到最小剪切机组的切片战略、 相应的迭接法, 以及为日光线建筑定制的适应性路径精炼器。 实验显示, 我们的切片战略将剪切开的间接费用减少到1.2以下, 并在相关工作中获得100- 200倍的加速。 由此产生的模拟时间从304 (2021年戈登钟奖) 减少到Sycamore RQC的149.2, 其可持续混合精度表现为 416.5 Pflops, 使用超过41M 核心来模拟 1M 相关样品。