Online ride-hailing services have become a prevalent transportation system across the world. In this paper, we study a challenging problem of how to direct vacant taxis around a city such that supplies and demands can be balanced in online ride-hailing services. We design a new reward scheme that considers multiple performance metrics of online ride-hailing services. We also propose a novel deep reinforcement learning method named Deep-Q-Network with Action Mask (AM-DQN) masking off unnecessary actions in various locations such that agents can learn much faster and more efficiently. We conduct extensive experiments using a city-scale dataset from Chicago. Several popular heuristic and learning methods are also implemented as baselines for comparison. The results of the experiments show that the AM-DQN attains the best performances of all methods with respect to average failure rate, average waiting time for customers, and average idle search time for vacant taxis.
翻译:在线乘车服务已成为全世界通行的交通系统。在本文中,我们研究了如何在城市周围直接空置出租车,使在线乘车服务的供应和需求能够平衡的难题。我们设计了一个新的奖励计划,考虑在线乘车服务的多重性能指标。我们还提出了名为“深Q网络与行动面具(AM-DQN)”的新型强化学习方法,以掩盖不同地点的不必要行动,使代理人能够更快和更有效地学习。我们利用芝加哥的城市规模数据集进行了广泛的实验。一些流行的超常和学习方法也被用作比较基线。实验结果显示,AM-DQN在平均失灵率、客户平均等待时间以及空出租车的平均闲搜索时间方面达到了所有方法的最佳性能。