We consider a meal delivery service fulfilling dynamic customer requests given a set of couriers over the course of a day. A courier's duty is to pick-up an order from a restaurant and deliver it to a customer. We model this service as a Markov decision process and use deep reinforcement learning as the solution approach. We experiment with the resulting policies on synthetic and real-world datasets and compare those with the baseline policies. We also examine the courier utilization for different numbers of couriers. In our analysis, we specifically focus on the impact of the limited available resources in the meal delivery problem. Furthermore, we investigate the effect of intelligent order rejection and re-positioning of the couriers. Our numerical experiments show that, by incorporating the geographical locations of the restaurants, customers, and the depot, our model significantly improves the overall service quality as characterized by the expected total reward and the delivery times. Our results present valuable insights on both the courier assignment process and the optimal number of couriers for different order frequencies on a given day. The proposed model also shows a robust performance under a variety of scenarios for real-world implementation.
翻译:我们考虑提供餐饮服务,满足活跃的顾客要求,在一天之内提供一批送货人。信使的职责是从一家餐馆领取订单,并将订单交给顾客。我们把这项服务作为Markov决定程序的模式,用深入强化学习作为解决办法。我们试验合成和实际世界数据集的政策,并将这些政策与基线政策进行比较。我们还检查信使对不同数目的送货人的利用情况。我们的分析特别侧重于食品交付问题中有限的可用资源的影响。此外,我们调查明智拒绝订单和重新配置信使的影响。我们的数字实验表明,通过将餐馆、顾客和仓库的地理位置纳入其中,我们的模式大大提高了预期总报酬和交货时间所具有的总体服务质量。我们的结果对信使派过程和某一天不同订货频率的送货员的最佳数目都提出了宝贵的见解。拟议的模式还显示,在现实世界实施的各种设想下取得了强有力的业绩。