Quantum circuit simulation provides the foundation for the development of quantum algorithms and the verification of quantum supremacy. Among the various methods for quantum circuit simulation, tensor network contraction has been increasing in popularity due to its ability to simulate a larger number of qubits. During tensor contraction, the input tensors are reshaped to matrices and computed by a GEMM operation, where these GEMM operations could reach up to 90\% of the total calculation time. GEMM throughput can be improved by utilizing mixed-precision hardware such as Tensor Cores, but straightforward implementation results in insufficient fidelity for deep and large quantum circuits. Prior work has demonstrated that compensated summation with special care of the rounding mode can fully recover the FP32 precision of SGEMM even when using TF32 or FP16 Tensor Cores. The exponent range is a critical issue when applying such techniques to quantum circuit simulation. While TF32 supports almost the same exponent range as FP32, FP16 supports a much smaller exponent range. In this work, we use the exponent range statistics of input tensor elements to select which Tensor Cores we use for the GEMM. We evaluate our method on Random Circuit Sampling (RCS), including Sycamore's quantum circuit, and show that the throughput is 1.86 times higher at maximum while maintaining accuracy.
翻译:量子电路模拟为量子算法的发展和量子功率的核查提供了基础。在量子电路模拟的各种方法中,由于能够模拟更多的qubit, 抗拉网络收缩越来越受欢迎。 在收缩时,输入的振动器被重塑为矩阵,并由GEMM操作计算,GEMM操作可以达到总计算时间的90<unk> 。 GEMM操作可以使用诸如Tensor核心等混合精密硬件来改进GEMM的吞吐量,但直接执行的结果是深大型量子电路的忠度不够。先前的工作表明,在特别小心圆形模式的情况下,补偿的加固加固能够完全恢复SGEMM的F32精确度。即使在使用TF32或FP16 Tensor Core核心时,这些输入量电路模拟技术可以达到90<unk> 。TFMM32支持几乎相同的推算范围,而FP16则支持更小得多的推算范围。在这项工作中,我们使用SAR-LM的推波范围统计数据可以完全恢复Siral-C,我们选择了Siral-C的频率,我们选择了Siral-CRiral-C的最大的频率,包括Syalxxxx。</s>