In this paper we revisit some of the fundamental premises for a reinforcement learning (RL) approach to self-learning traffic lights. We propose RLight, a combination of choices that offers robust performance and good generalization to unseen traffic flows. In particular, our main contributions are threefold: our lightweight and cluster-aware state representation leads to improved performance; we reformulate the MDP such that it skips redundant timesteps of yellow light, speeding up learning by 30%; and we investigate the action space and provide insight into the difference in performance between acyclic and cyclic phase transitions. Additionally, we provide insights into the generalisation of the methods to unseen traffic. Evaluations using the real-world Hangzhou traffic dataset show that RLight outperforms state-of-the-art rule-based and deep reinforcement learning algorithms, demonstrating the potential of RL-based methods to improve urban traffic flows.
翻译:在本文中,我们重新审视了对自学交通灯使用强化学习(RL)方法的一些基本前提。我们提议了RLight, 将各种选择结合起来,以提供强有力的性能和对隐蔽交通流的正确概括。特别是,我们的主要贡献有三重:我们的轻量和集束意识国家代表制导致业绩的改善;我们重塑MDP,以避免黄光的冗余时段,加速30%的学习;我们调查行动空间,并深入了解环流和周期阶段过渡之间的性能差异。此外,我们提供了对隐形交通方法的概括性洞察。使用现实世界杭州交通数据集的评估显示,RLight超越了基于规则的先进和深入强化的学习算法,展示了基于RL的方法改善城市交通流量的潜力。