With the development of experimental quantum technology, quantum control has attracted increasing attention due to the realization of controllable artificial quantum systems. However, because quantum-mechanical systems are often too difficult to analytically deal with, heuristic strategies and numerical algorithms which search for proper control protocols are adopted, and, deep learning, especially deep reinforcement learning (RL), is a promising generic candidate solution for the control problems. Although there have been a few successful applications of deep RL to quantum control problems, most of the existing RL algorithms suffer from instabilities and unsatisfactory reproducibility, and require a large amount of fine-tuning and a large computational budget, both of which limit their applicability. To resolve the issue of instabilities, in this dissertation, we investigate the non-convergence issue of Q-learning. Then, we investigate the weakness of existing convergent approaches that have been proposed, and we develop a new convergent Q-learning algorithm, which we call the convergent deep Q network (C-DQN) algorithm, as an alternative to the conventional deep Q network (DQN) algorithm. We prove the convergence of C-DQN and apply it to the Atari 2600 benchmark. We show that when DQN fail, C-DQN still learns successfully. Then, we apply the algorithm to the measurement-feedback cooling problems of a quantum quartic oscillator and a trapped quantum rigid body. We establish the physical models and analyse their properties, and we show that although both C-DQN and DQN can learn to cool the systems, C-DQN tends to behave more stably, and when DQN suffers from instabilities, C-DQN can achieve a better performance. As the performance of DQN can have a large variance and lack consistency, C-DQN can be a better choice for researches on complicated control problems.
翻译:随着实验量子技术的发展,量子控制由于实现了可控制的人工量子系统而引起越来越多的关注。然而,由于量子机械系统往往难以分析处理,因此采用寻求正确控制规程的超常策略和数字算法,而深层次学习,特别是深度强化学习(RL)是解决控制问题的一个很有希望的通用候选解决方案。虽然在量子控制问题方面有一些深层RL的成功应用,但现有的NL算法大多具有不稳定性和不令人满意的可复制性,需要大量精细调整和庞大的计算预算,两者都限制了它们的实用性。为了解决不稳定性的问题,在这种分散性协议中,我们调查了不协调的Q学习问题。然后,我们调查所提出的现有趋同方法的弱点,我们开发了一种新的趋同性Q算法,我们称之为趋同的NQ网络(C-D网络)的不稳定性能和庞大的计算法,在常规的QQQ网络中,C-D的C-D测算法中,我们发现C-D测算法的收效和C-Q-C-Q-C-C-C-C-C-C-C-dealalalal 学会的动作,我们学习学习的动作和C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-和C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C