交通信号控制：离线强化学习能否解决现实世界问题？ (Data Might be Enough: Bridge Real-World Traffic Signal Control Using Offline Reinforcement Learning)

Applying reinforcement learning (RL) to traffic signal control (TSC) has become a promising solution. However, most RL-based methods focus solely on optimization within simulators and give little thought to deployment issues in the real world. Online RL-based methods, which require interaction with the environment, are limited in their interactions with the real-world environment. Additionally, acquiring an offline dataset for offline RL is challenging in the real world. Moreover, most real-world intersections prefer a cyclical phase structure. To address these challenges, we propose: (1) a cyclical offline dataset (COD), designed based on common real-world scenarios to facilitate easy collection; (2) an offline RL model called DataLight, capable of learning satisfactory control strategies from the COD; and (3) a method called Arbitrary To Cyclical (ATC), which can transform most RL-based methods into cyclical signal control. Extensive experiments using real-world datasets on simulators demonstrate that: (1) DataLight outperforms most existing methods and achieves comparable results with the best-performing method; (2) introducing ATC into some recent RL-based methods achieves satisfactory performance; and (3) COD is reliable, with DataLight remaining robust even with a small amount of data. These results suggest that the cyclical offline dataset might be enough for offline RL for TSC. Our proposed methods make significant contributions to the TSC field and successfully bridge the gap between simulation experiments and real-world applications. Our code is released on Github.

翻译：应用强化学习（RL）进行交通信号控制（TSC）已经成为一种具有前景的解决方案。然而，大多数基于RL的方法仅专注于模拟器内的优化，并未充分考虑在现实世界中部署的问题。基于在线RL的方法需要与环境进行交互，但其与真实环境的交互受到限制。此外，在真实世界中获得离线数据集以进行RL训练是具有挑战性的。此外，大多数现实世界的交叉口更喜欢循环相位结构。为了解决这些挑战，我们提出了以下方法：（1）一个循环离线数据集（COD），设计基于常见的现实世界场景以便于收集；（2）一种称为DataLight的离线RL模型，能够从COD中学习令人满意的控制策略；以及（3）一种称为Arbitrary To Cyclical（ATC）的方法，可以将大多数基于RL的方法转化为循环信号控制。使用模拟器上的真实数据集进行的广泛实验表明：（1）DataLight优于大多数现有方法，并达到最佳性能方法的相当水平；（2）将ATC引入到一些最近的基于RL的方法中可实现令人满意的表现；以及（3）COD是可靠的，DataLight即使数据量很小仍保持稳健。这些结果表明，循环离线数据集可能足以用于TSC的离线RL。我们提出的方法为TSC领域做出了重大贡献，成功地弥合了模拟实验和实际应用之间的差距。我们的代码在Github上发布。

相关内容

TSC

关注 0

服务范围涵盖服务创新研发的所有计算和软件科学技术方面。IEEE服务计算事务强调算法、数学、统计和计算方法，这些方法是服务计算的核心，是面向服务的体系结构、Web服务、业务流程集成、解决方案性能管理、服务操作和管理的新兴领域。官网地址：http://dblp.uni-trier.de/db/journals/tsc/

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

119+阅读 · 2022年5月7日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日