In this paper, we consider same-day delivery with vehicles and drones. Customers make delivery requests over the course of the day, and the dispatcher dynamically dispatches vehicles and drones to deliver the goods to customers before their delivery deadline. Vehicles can deliver multiple packages in one route but travel relatively slowly due to the urban traffic. Drones travel faster, but they have limited capacity and require charging or battery swaps. To exploit the different strengths of the fleets, we propose a deep Q-learning approach. Our method learns the value of assigning a new customer to either drones or vehicles as well as the option to not offer service at all. In a systematic computational analysis, we show the superiority of our policy compared to benchmark policies and the effectiveness of our deep Q-learning approach. We also show that our policy can maintain effectiveness when the fleet size changes moderately. Experiments on data drawn from varied spatial/temporal distributions demonstrate that our trained policies can cope with changes in the input data.
翻译:在本文中,我们考虑的是用车辆和无人驾驶飞机在同一天交货。客户在当天提出交货请求,调度员在交货期限之前积极发送车辆和无人驾驶飞机将货物交付给客户。车辆可以在一条路线上交付多套包裹,但由于城市交通而旅行相对缓慢。德龙斯旅行较快,但能力有限,需要充电或电池交换。为了利用车队的不同优势,我们建议了一种深层次的学习方法。我们的方法学习了为无人驾驶飞机或车辆指派新客户的价值以及根本不提供服务的选择。在系统计算分析中,我们显示了我们的政策相对于基准政策和我们深层次的Q学习方法的有效性。我们还表明,当车队规模发生中度变化时,我们的政策可以保持效力。从不同的空间/时空分布中提取的数据实验表明,我们经过培训的政策可以应对输入数据的变化。