Recently, Intelligent Transportation Systems are leveraging the power of increased sensory coverage and computing power to deliver data-intensive solutions achieving higher levels of performance than traditional systems. Within Traffic Signal Control (TSC), this has allowed the emergence of Machine Learning (ML) based systems. Among this group, Reinforcement Learning (RL) approaches have performed particularly well. Given the lack of industry standards in ML for TSC, literature exploring RL often lacks comparison against commercially available systems and straightforward formulations of how the agents operate. Here we attempt to bridge that gap. We propose three different architectures for TSC RL agents and compare them against the currently used commercial systems MOVA, SurTrac and Cyclic controllers and provide pseudo-code for them. The agents use variations of Deep Q-Learning and Actor Critic, using states and rewards based on queue lengths. Their performance is compared in across different map scenarios with variable demand, assessing them in terms of the global delay and average queue length. We find that the RL-based systems can significantly and consistently achieve lower delays when compared with existing commercial systems.
翻译:最近,智能运输系统正在利用增加感官覆盖面和计算能力的力量,提供比传统系统更高的性能水平的数据密集型解决方案。在交通信号控制系统(TSC)内,这使得机器学习(ML)系统得以出现。在这一组中,加强学习(RL)方法表现特别好。鉴于在ML中缺乏对TSC的行业标准,探索RL的文献往往缺乏与商业可用系统以及代理商操作方式的直截了当的配方的比较。我们在这里试图弥合这一差距。我们为TSC RL代理商提出了三种不同的结构,并将其与目前使用的商业系统MOVA、SurTrac和Cyclic控制器进行比较,并为它们提供了假码。代理商使用基于排队长的状态和奖赏,使用深QL学习和Acor Critic的变体。其性能在不同的地图情景中与不同的需求进行了比较,从全球延迟和平均排队长度的角度对其进行评估。我们发现,基于RL的系统与现有的商业系统相比可以大大和一贯地减少延误。