Vehicle Routing Problems (VRPs) in real-world applications often come with various constraints, therefore bring additional computational challenges to exact solution methods or heuristic search approaches. The recent idea to learn heuristic move patterns from sample data has become increasingly promising to reduce solution developing costs. However, using learning-based approaches to address more types of constrained VRP remains a challenge. The difficulty lies in controlling for constraint violations while searching for optimal solutions. To overcome this challenge, we propose a Reinforcement Learning based method to solve soft-constrained VRPs by incorporating the Lagrangian relaxation technique and using constrained policy optimization. We apply the method on three common types of VRPs, the Travelling Salesman Problem with Time Windows (TSPTW), the Capacitated VRP (CVRP) and the Capacitated VRP with Time Windows (CVRPTW), to show the generalizability of the proposed method. After comparing to existing RL-based methods and open-source heuristic solvers, we demonstrate its competitive performance in finding solutions with a good balance in travel distance, constraint violations and inference speed.
翻译:现实应用中的车辆流动问题往往面临各种限制,因此,给精确的解决方案方法或超理论搜索方法带来了额外的计算挑战。最近从抽样数据中学习超常移动模式的想法越来越有希望降低解决方案开发成本。然而,利用学习方法解决更多类型的受限制的VRP问题仍是一个挑战。在寻找最佳解决方案的同时,难以控制限制违规现象。为了克服这一挑战,我们建议采用基于强化学习的方法,通过采用拉格朗格放松技术和有限的政策优化,解决软约束的VRP。我们对三种常见的VRP方案采用这种方法,即用时间视窗(TSPTW)、快速VRP(C)和快速时间视窗(CVRPTW),以显示拟议方法的通用性。在与现有RL方法和开放源超理论解算方法进行比较之后,我们展示了在寻找在旅行距离、限制违规和推断速度方面保持良好平衡的解决办法方面的竞争性表现。