基于模型的图表强化学习,用于感带通信信号控制 (Model-based graph reinforcement learning for inductive traffic signal control)

Most reinforcement learning methods for adaptive-traffic-signal-control require training from scratch to be applied on any new intersection or after any modification to the road network, traffic distribution, or behavioral constraints experienced during training. Considering 1) the massive amount of experience required to train such methods, and 2) that experience must be gathered by interacting in an exploratory fashion with real road-network-users, such a lack of transferability limits experimentation and applicability. Recent approaches enable learning policies that generalize for unseen road-network topologies and traffic distributions, partially tackling this challenge. However, the literature remains divided between the learning of cyclic (the evolution of connectivity at an intersection must respect a cycle) and acyclic (less constrained) policies, and these transferable methods 1) are only compatible with cyclic constraints and 2) do not enable coordination. We introduce a new model-based method, MuJAM, which, on top of enabling explicit coordination at scale for the first time, pushes generalization further by allowing a generalization to the controllers' constraints. In a zero-shot transfer setting involving both road networks and traffic settings never experienced during training, and in a larger transfer experiment involving the control of 3,971 traffic signal controllers in Manhattan, we show that MuJAM, using both cyclic and acyclic constraints, outperforms domain-specific baselines as well as another transferable approach.

翻译：适应-交通信号控制的大多数强化学习方法要求从零开始培训,以便用于任何新的十字路口或道路网络、交通分布或培训过程中遇到的行为限制。考虑到培训这些方法所需的大量经验,以及2)经验必须通过与真正的道路网络用户进行探索性互动来收集,例如缺乏可转移性限制试验和适用性。最近的方法使得学习政策能够普遍推广看不见的道路网络地形和交通分布,部分地应对这一挑战。然而,文献仍然在循环(交叉连接的演变必须尊重一个周期)和循环(不受限制)政策以及这些可转移方法1 之间有分歧,但仅与循环限制方法相容,2 无法进行协调。我们采用了一种新的基于模式的方法,即MuJAM,除了第一次能够进行明确的协调外,还允许对控制者的限制加以概括化。但是,在零点转移中,在培训期间从未经历的公路网络和交通环境的演变中,以及循环(不受限制的)政策与循环(不受限制的)政策和这些可转移的方法1)政策之间仍然有分歧,只有与循环限制的制约性限制,并且2不能使协调成为协调。我们采用了一种新的基于模式的方法方法,除了首次进行明确的协调范围协调外,通过让控制,在曼哈顿卡路卡路基座上进行另一个控制。