用深加固学习和概率政策再利用来解码表面编码 (Decoding surface codes with deep reinforcement learning and probabilistic policy reuse)

Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.

翻译：量子计算(QC)在某些古典计算机的硬计算任务上具有巨大的优势。然而,目前量子硬件,又称杂音中间级量子计算机(NISQ),仍然无法忠实地进行计算,主要原因是缺乏量子错误校正(QEC)能力。大量理论研究提供了各种类型的量子计算代码;显著的表面代码之一是表面代码,其特征,例如要求只有近距离相邻的双方位控制门和大误差阈值,使它成为可变量计算的主要候选对象。机器学习(ML)基础技术的最新发展,特别是强化学习(RL)方法已经应用于解码问题,并已取得一定进展。尽管如此,设备噪音模式可能会随着时间的变化而变化,使经过培训的解码模型无效。在本文中,我们建议一种持续的强化学习方法来应对这些解码挑战。具体地,我们采用双倍的深度Q-Q-P学习方法,通过稳定政策再利用(DDQN-PPR)模型来学习可变量子计算。最近开发的机器学习(ML)技术,特别是强化学习增强强化学习的学习(R)方法,并显示我们经过训练的硬度变的QDDDDDDD)变的计算方法,可以改进我们的计算方法,可以使我们不断变的计算结果的模化的模化的计算。我们变动的计算方法能够变动的模。