Deep Reinforcement Learning (RL) has considerably advanced over the past decade. At the same time, state-of-the-art RL algorithms require a large computational budget in terms of training time to converge. Recent work has started to approach this problem through the lens of quantum computing, which promises theoretical speed-ups for several traditionally hard tasks. In this work, we examine a class of hybrid quantum-classical RL algorithms that we collectively refer to as variational quantum deep Q-networks (VQ-DQN). We show that VQ-DQN approaches are subject to instabilities that cause the learned policy to diverge, study the extent to which this afflicts reproduciblity of established results based on classical simulation, and perform systematic experiments to identify potential explanations for the observed instabilities. Additionally, and in contrast to most existing work on quantum reinforcement learning, we execute RL algorithms on an actual quantum processing unit (an IBM Quantum Device) and investigate differences in behaviour between simulated and physical quantum systems that suffer from implementation deficiencies. Our experiments show that, contrary to opposite claims in the literature, it cannot be conclusively decided if known quantum approaches, even if simulated without physical imperfections, can provide an advantage as compared to classical approaches. Finally, we provide a robust, universal and well-tested implementation of VQ-DQN as a reproducible testbed for future experiments.
翻译:在过去十年里,深入强化学习(RL)取得了相当的进展。与此同时,先进的RL算法需要大量的计算预算,在培训时间方面需要大量计算预算。最近的工作已经开始从量子计算的角度来处理这一问题,这有可能为一些传统的艰巨任务带来理论加速。在这项工作中,我们检查了一组混合量子古典RL算法,我们统称为变量量深Q网络(VQ-DQN),我们发现,VQ-DQN 方法存在不稳定性,导致所学政策出现差异,研究这在多大程度上会影响基于经典模拟的既定结果的不正统性,并进行系统实验以确定观察到的不稳定性的潜在解释。此外,与大多数现有的量子强化学习工作相比,我们用一个实际量子处理单位(IBM Qunantum 设备)来进行RL算法,并调查因执行缺陷而受损害的模拟和物理量子系统之间的行为差异。我们的实验显示,即使与经典模拟实验方法相反,如果我们最终能够提供一种不完善的测试性的方法,那么,我们作为模拟的物理文献的模拟性试验中的一种最终也无法提供一种测试性的方法。