Autonomous Mobility-on-Demand (AMoD) systems are a rapidly evolving mode of transportation in which a centrally coordinated fleet of self-driving vehicles dynamically serves travel requests. The control of these systems is typically formulated as a large network optimization problem, and reinforcement learning (RL) has recently emerged as a promising approach to solve the open challenges in this space. However, current RL-based approaches exclusively focus on learning from online data, fundamentally ignoring the per-sample-cost of interactions within real-world transportation systems. To address these limitations, we propose to formalize the control of AMoD systems through the lens of offline reinforcement learning and learn effective control strategies via solely offline data, thus readily available to current mobility operators. We further investigate design decisions and provide experiments on real-world mobility systems showing how offline learning allows to recover AMoD control policies that (i) exhibit performance on par with online methods, (ii) drastically improve data efficiency, and (iii) completely eliminate the need for complex simulated environments. Crucially, this paper demonstrates that offline reinforcement learning is a promising paradigm for the application of RL-based solutions within economically-critical systems, such as mobility systems.
翻译:在这种运输模式中,中央协调的自驾驶车辆车队动态地为旅行请求服务。这些系统的控制通常被设计成一个大型网络优化问题,而增强学习(RL)最近成为解决这一空间的公开挑战的一个有希望的办法。然而,目前基于RL的办法完全侧重于从在线数据中学习,从根本上忽视了现实世界运输系统内部互动的人均成本。为克服这些限制,我们提议通过离线强化离线学习的镜头正式控制AMOD系统,通过纯粹离线数据学习和学习有效的控制战略,这样目前流动操作者就很容易获得这种数据。我们进一步调查设计决定,并提供对现实世界流动系统的实验,表明离线学习可如何恢复AMOD控制政策,即(一) 显示与在线方法相同的业绩,(二) 大幅度提高数据效率,(三) 完全消除对复杂模拟环境的需要。毫无疑问,本文表明,离线强化学习是在经济临界系统内应用基于RL的流动性解决方案的一个很有希望的范例。</s>