食物交付问题的深入强化学习方法 (A Deep Reinforcement Learning Approach for the Meal Delivery Problem)

We consider a meal delivery service fulfilling dynamic customer requests given a set of couriers over the course of a day. A courier's duty is to pick-up an order from a restaurant and deliver it to a customer. We model this service as a Markov decision process and use deep reinforcement learning as the solution approach. We experiment with the resulting policies on synthetic and real-world datasets and compare those with the baseline policies. We also examine the courier utilization for different numbers of couriers. In our analysis, we specifically focus on the impact of the limited available resources in the meal delivery problem. Furthermore, we investigate the effect of intelligent order rejection and re-positioning of the couriers. Our numerical experiments show that, by incorporating the geographical locations of the restaurants, customers, and the depot, our model significantly improves the overall service quality as characterized by the expected total reward and the delivery times. Our results present valuable insights on both the courier assignment process and the optimal number of couriers for different order frequencies on a given day. The proposed model also shows a robust performance under a variety of scenarios for real-world implementation.

翻译：我们考虑提供餐饮服务,满足活跃的顾客要求,在一天之内提供一批送货人。信使的职责是从一家餐馆领取订单,并将订单交给顾客。我们把这项服务作为Markov决定程序的模式,用深入强化学习作为解决办法。我们试验合成和实际世界数据集的政策,并将这些政策与基线政策进行比较。我们还检查信使对不同数目的送货人的利用情况。我们的分析特别侧重于食品交付问题中有限的可用资源的影响。此外,我们调查明智拒绝订单和重新配置信使的影响。我们的数字实验表明,通过将餐馆、顾客和仓库的地理位置纳入其中,我们的模式大大提高了预期总报酬和交货时间所具有的总体服务质量。我们的结果对信使派过程和某一天不同订货频率的送货员的最佳数目都提出了宝贵的见解。拟议的模式还显示,在现实世界实施的各种设想下取得了强有力的业绩。

相关内容

深度强化学习

关注 155

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

专知会员服务

138+阅读 · 2020年4月3日