The development of quantum machine learning (QML) has received a lot of interest recently thanks to developments in both quantum computing (QC) and machine learning (ML). One of the ML paradigms that can be utilized to address challenging sequential decision-making issues is reinforcement learning (RL). It has been demonstrated that classical RL can successfully complete many difficult tasks. A leading method of building quantum RL agents relies on the variational quantum circuits (VQC). However, training QRL algorithms with VQCs requires significant amount of computational resources. This issue hurdles the exploration of various QRL applications. In this paper, we approach this challenge through asynchronous training QRL agents. Specifically, we choose the asynchronous training of advantage actor-critic variational quantum policies. We demonstrate the results via numerical simulations that within the tasks considered, the asynchronous training of QRL agents can reach performance comparable to or superior than classical agents with similar model sizes and architectures.
翻译:由于量子计算(QC)和机器学习(ML)方面的发展,量子机学习(QML)的发展最近引起了很大的兴趣。可以用来解决具有挑战性的连续决策问题的ML模式之一是强化学习(RL)。已经证明古典RL能够成功地完成许多困难的任务。建立量子机学习(QML)的主要方法依赖于变量量子电路(VQC)。然而,用VQC来培训QRL算法需要大量的计算资源。这阻碍着对各种QRL应用的探索。在本文中,我们通过无同步培训QRL代理来应对这一挑战。具体地说,我们选择了优势的动作-曲变量量政策的不同步培训。我们通过数字模拟来展示在所考虑的任务中,对QRL代理进行无序式的训练可以达到与模范大小和结构相似的古典代理相比或优异的性能。