Applying reinforcement learning (RL) to traffic signal control (TSC) has become a promising solution. However, most RL-based methods focus solely on optimization within simulators and give little thought to deployment issues in the real world. Online RL-based methods, which require interaction with the environment, are limited in their interactions with the real-world environment. Additionally, acquiring an offline dataset for offline RL is challenging in the real world. Moreover, most real-world intersections prefer a cyclical phase structure. To address these challenges, we propose: (1) a cyclical offline dataset (COD), designed based on common real-world scenarios to facilitate easy collection; (2) an offline RL model called DataLight, capable of learning satisfactory control strategies from the COD; and (3) a method called Arbitrary To Cyclical (ATC), which can transform most RL-based methods into cyclical signal control. Extensive experiments using real-world datasets on simulators demonstrate that: (1) DataLight outperforms most existing methods and achieves comparable results with the best-performing method; (2) introducing ATC into some recent RL-based methods achieves satisfactory performance; and (3) COD is reliable, with DataLight remaining robust even with a small amount of data. These results suggest that the cyclical offline dataset might be enough for offline RL for TSC. Our proposed methods make significant contributions to the TSC field and successfully bridge the gap between simulation experiments and real-world applications. Our code is released on Github.
翻译:应用强化学习(RL)进行交通信号控制(TSC)已经成为一种具有前景的解决方案。然而,大多数基于RL的方法仅专注于模拟器内的优化,并未充分考虑在现实世界中部署的问题。基于在线RL的方法需要与环境进行交互,但其与真实环境的交互受到限制。此外,在真实世界中获得离线数据集以进行RL训练是具有挑战性的。此外,大多数现实世界的交叉口更喜欢循环相位结构。为了解决这些挑战,我们提出了以下方法:(1)一个循环离线数据集(COD),设计基于常见的现实世界场景以便于收集;(2)一种称为DataLight的离线RL模型,能够从COD中学习令人满意的控制策略;以及(3)一种称为Arbitrary To Cyclical(ATC)的方法,可以将大多数基于RL的方法转化为循环信号控制。使用模拟器上的真实数据集进行的广泛实验表明:(1)DataLight优于大多数现有方法,并达到最佳性能方法的相当水平;(2)将ATC引入到一些最近的基于RL的方法中可实现令人满意的表现;以及(3)COD是可靠的,DataLight即使数据量很小仍保持稳健。这些结果表明,循环离线数据集可能足以用于TSC的离线RL。我们提出的方法为TSC领域做出了重大贡献,成功地弥合了模拟实验和实际应用之间的差距。我们的代码在Github上发布。