In recent years, there has been increasing amount of interest around meta reinforcement learning methods for traffic signal control, which have achieved better performance compared with traditional control methods. However, previous methods lack robustness in adaptation and stability in training process in complex situations, which largely limits its application in real-world traffic signal control. In this paper, we propose a novel value-based Bayesian meta-reinforcement learning framework BM-DQN to robustly speed up the learning process in new scenarios by utilizing well-trained prior knowledge learned from existing scenarios. This framework is based on our proposed fast-adaptation variation to Gradient-EM Bayesian Meta-learning and the fast-update advantage of DQN, which allows for fast adaptation to new scenarios with continual learning ability and robustness to uncertainty. The experiments on restricted 2D navigation and traffic signal control show that our proposed framework adapts more quickly and robustly in new scenarios than previous methods, and specifically, much better continual learning ability in heterogeneous scenarios.
翻译:近年来,人们对与传统控制方法相比,交通信号控制元强化学习方法的兴趣日益浓厚,与传统控制方法相比,这些方法取得了更好的性能;然而,以往的方法在复杂情况下培训过程的适应性和稳定性方面缺乏强健性,这在很大程度上限制了其在现实世界交通信号控制中的应用;在本文件中,我们提议采用新的基于价值的巴耶斯元强化学习框架BM-DQN,以便利用现有情景中经过良好培训的事先知识,在新的情景中大力加快学习进程;这一框架的基础是我们提议的 " 梯度-EM-巴耶斯元学习 " 的快速适应性变异和DQN的快速更新优势,使得能够快速适应具有持续学习能力和对不确定性的稳健性的新情景;关于限制2D导航和交通信号控制实验表明,我们提议的框架比以往方法更快速和有力地适应新的情景,具体地说,在多种情景中,持续学习的能力要好得多。