This paper develops an inherently parallelised, fast, approximate learning-based solution to the generic class of Capacitated Vehicle Routing Problems with Time Windows and Dynamic Routing (CVRP-TWDR). Considering vehicles in a fleet as decentralised agents, we postulate that using reinforcement learning (RL) based adaptation is a key enabler for real-time route formation in a dynamic environment. The methodology allows each agent (vehicle) to independently evaluate the value of serving each customer, and uses a centralised allocation heuristic to finalise the allocations based on the generated values. We show that the solutions produced by this method are significantly faster than exact formulations and state-of-the-art meta-heuristics, while being reasonably close to optimal in terms of solution quality. We describe experiments in both the static case (when all customer demands and time windows are known in advance) as well as the dynamic case (where customers can pop up at any time during execution). The results with a single trained model on large, out-of-distribution test data demonstrate the scalability and flexibility of the proposed approach.
翻译:本文为时视窗和动态路由问题(CVRP-TWDR)这一通用类机动车辆运行问题(CVRP-TWDR)开发了一种固有的平行、快速、近似基于学习的解决方案。考虑到车队中的车辆是分散化剂,我们假设使用基于强化学习(RL)的适应是动态环境中实时路由形成的关键促进因素。这种方法使每个代理商(车辆)能够独立评估为每个客户服务的价值,并使用集中化的分配超常来根据生成的值最终确定分配。我们表明,这一方法产生的解决方案比精确的配方和最新现代超强得多,但在解决方案质量方面相当接近于最佳水平。我们描述了静态案例(当所有客户需求和时间窗口都预先知晓时)以及动态案例(客户可以在实施过程中随时出现)的实验结果。我们用单一经过培训的模型来展示了拟议方法的伸缩性和灵活性。