The development of mobility-on-demand services, rich transportation data sources, and autonomous vehicles (AVs) creates significant opportunities for shared-use AV mobility services (SAMSs) to provide accessible and demand-responsive personal mobility. SAMS fleet operation involves multiple interrelated decisions, with a primary focus on efficiently fulfilling passenger ride requests with a high level of service quality. This paper focuses on improving the efficiency and service quality of a SAMS vehicle fleet via anticipatory repositioning of idle vehicles. The rebalancing problem is formulated as a Markov Decision Process, which we propose solving using an advantage actor critic (A2C) reinforcement learning-based method. The proposed approach learns a rebalancing policy that anticipates future demand and cooperates with an optimization-based assignment strategy. The approach allows for centralized repositioning decisions and can handle large vehicle fleets since the problem size does not change with the fleet size. Using New York City taxi data and an agent-based simulation tool, two versions of the A2C AV repositioning approach are tested. The first version, A2C-AVR(A), learns to anticipate future demand based on past observations, while the second, A2C-AVR(B), uses demand forecasts. The models are compared to an optimization-based rebalancing approach and show significant reduction in mean passenger waiting times, with a slightly increased percentage of empty fleet miles travelled. The experiments demonstrate the model's ability to anticipate future demand and its transferability to cases unseen at the training stage.
翻译:随着出行服务、丰富的运输数据源和自主驾驶车辆(AV)的发展,共享自动驾驶出行服务(SAMS)具有为个人提供可访问和需求响应的出行服务的重大机遇。SAMS车队运营涉及多个相互关联的决策,主要集中在通过高水平的服务质量高效地满足乘客出行请求。本文着重于通过准确估计未来需求,重新规划空闲车辆,以提高SAMS车队的效率和服务质量。将重新平衡问题视为马尔可夫决策过程,并采用有利的演员-评论家(A2C)强化学习方法进行求解。所提出的方法学习重新平衡政策,可预测未来需求,并与基于优化的分配策略进行协作。该方法允许集中管理重新定位决策,并且可以处理大型车队,因为问题规模不随车队大小而改变。通过纽约市出租车数据和基于智能体的仿真工具,测试了两个版本的A2C AV重新定位方法。第一个版本A2C-AVR(A)是基于过去的观察学习预测未来需求,而第二个版本A2C-AVR(B)使用需求预测。将模型与基于优化的重新平衡方法进行比较,结果显示旅客的平均等待时间显着减少,同时空车里程略微增加。实验表明,该模型具有预测未来需求的能力,且其可迁移至在培训阶段未见过的情况。