学习如何在共用公路上机动机动机动车辆 (Learning How to Dynamically Route Autonomous Vehicles on Shared Roads)

Road congestion induces significant costs across the world, and road network disturbances, such as traffic accidents, can cause highly congested traffic patterns. If a planner had control over the routing of all vehicles in the network, they could easily reverse this effect. In a more realistic scenario, we consider a planner that controls autonomous cars, which are a fraction of all present cars. We study a dynamic routing game, in which the route choices of autonomous cars can be controlled and the human drivers react selfishly and dynamically. As the problem is prohibitively large, we use deep reinforcement learning to learn a policy for controlling the autonomous vehicles. This policy indirectly influences human drivers to route themselves in such a way that minimizes congestion on the network. To gauge the effectiveness of our learned policies, we establish theoretical results characterizing equilibria and empirically compare the learned policy results with best possible equilibria. We prove properties of equilibria on parallel roads and provide a polynomial-time optimization for computing the most efficient equilibrium. Moreover, we show that in the absence of these policies, high demand and network perturbations would result in large congestion, whereas using the policy greatly decreases the travel times by minimizing the congestion. To the best of our knowledge, this is the first work that employs deep reinforcement learning to reduce congestion by indirectly influencing humans' routing decisions in mixed-autonomy traffic.

翻译：交通事故等交通事故等交通网络混乱在世界各地引发了巨大的成本,而道路拥堵会在世界各地引发巨大的成本,而交通事故等道路网络混乱会引发高度拥挤的交通模式。如果计划者能够控制网络内所有车辆的路线,他们就可以很容易地扭转这一效应。在更现实的情景下,我们考虑一个控制自治汽车的计划者,这是目前所有汽车的一部分。我们研究一种动态的路线游戏,在这种游戏中,自主汽车的路线选择可以被控制,人类驾驶者可以自私和动态地作出反应。由于问题之大,我们利用深度强化学习学习来学习控制自主车辆的政策。这一政策间接地影响人类驾驶者自己路线的路线,从而将网络的交通堵塞减少到最低限度。为了衡量我们所学政策的有效性,我们建立理论性的结果是平衡,将所学的政策结果与最佳的平衡进行比较。我们证明了平行道路的平衡性特性,并为计算最有效率的平衡提供了一种多时的优化。此外,我们表明,如果没有这些政策,高的需求和网络过低的交通流量,就会间接地影响着网络本身的路线,从而最大限度地影响着网络的交通堵塞,同时,而通过学习,通过极低的交通堵塞。