Deep Reinforcement Learning (RL) has considerably advanced over the past decade. At the same time, state-of-the-art RL algorithms require a large computational budget in terms of training time to converge. Recent work has started to approach this problem through the lens of quantum computing, which promises theoretical speed-ups for several traditionally hard tasks. In this work, we examine a class of hybrid quantumclassical RL algorithms that we collectively refer to as variational quantum deep Q-networks (VQ-DQN). We show that VQ-DQN approaches are subject to instabilities that cause the learned policy to diverge, study the extent to which this afflicts reproduciblity of established results based on classical simulation, and perform systematic experiments to identify potential explanations for the observed instabilities. Additionally, and in contrast to most existing work on quantum reinforcement learning, we execute RL algorithms on an actual quantum processing unit (an IBM Quantum Device) and investigate differences in behaviour between simulated and physical quantum systems that suffer from implementation deficiencies. Our experiments show that, contrary to opposite claims in the literature, it cannot be conclusively decided if known quantum approaches, even if simulated without physical imperfections, can provide an advantage as compared to classical approaches. Finally, we provide a robust, universal and well-tested implementation of VQ-DQN as a reproducible testbed for future experiments.
翻译:在过去十年里,深入强化学习(RL)取得了相当的进展。与此同时,先进的RL算法需要大量的计算预算,在培训时间方面需要大量计算预算。最近的工作已经开始从量子计算的角度来处理这一问题,这有可能为一些传统的艰巨任务带来理论加速。在这项工作中,我们检查了一组混合的量子古典RL算法,我们统称为变量量深Q网络(VQ-DQN)。我们显示,VQ-DQN方法存在不稳定性,导致所学政策出现差异,研究这种差异在多大程度上会影响基于经典模拟的既定结果的反常性,并进行系统实验,以确定观察到的不稳定性的潜在解释。此外,与大多数现有的量子强化学习工作相比,我们在实际量子处理单位(IBM Qantum 设备)上执行VL算法,并调查模拟和物理量子系统之间因执行缺陷而出现的行为差异。我们的实验显示,即使与经典模拟的实验方法相反,如果我们最终能够提供一种模拟性检验性的方法,那么,如果我们所知道的、模拟的、模拟的、模拟的、最终的物理实验性的方法能够提供一种模拟性的方法。