Learning to solve combinatorial optimization problems, such as the vehicle routing problem, offers great computational advantages over classical operations research solvers and heuristics. The recently developed deep reinforcement learning approaches either improve an initially given solution iteratively or sequentially construct a set of individual tours. However, most of the existing learning-based approaches are not able to work for a fixed number of vehicles and thus bypass the complex assignment problem of the customers onto an apriori given number of available vehicles. On the other hand, this makes them less suitable for real applications, as many logistic service providers rely on solutions provided for a specific bounded fleet size and cannot accommodate short term changes to the number of vehicles. In contrast we propose a powerful supervised deep learning framework that constructs a complete tour plan from scratch while respecting an apriori fixed number of available vehicles. In combination with an efficient post-processing scheme, our supervised approach is not only much faster and easier to train but also achieves competitive results that incorporate the practical aspect of vehicle costs. In thorough controlled experiments we compare our method to multiple state-of-the-art approaches where we demonstrate stable performance, while utilizing less vehicles and shed some light on existent inconsistencies in the experimentation protocols of the related work.
翻译:解决汽车路由问题等组合优化问题,对古典作业研究求解器和螺旋体而言,具有巨大的计算优势。最近开发的深强化学习方法要么改进最初给定的解决方案,要么迭接地或按顺序建造一套个人参观。然而,大多数现有的基于学习的方法无法为固定数量的车辆工作,从而绕过复杂的客户任务分配问题,偏向于特定数量的现有车辆。另一方面,这使他们不那么适合实际应用,因为许多后勤服务提供者依赖为特定封闭车队规模提供的解决办法,无法满足车辆数目的短期变化。相比之下,我们提议了一个强有力的、有监督的深层次学习框架,从零开始构建一个完整的旅游计划,同时尊重少数固定数量的现有车辆。与高效的后处理计划相结合,我们的监督方法不仅更快捷、容易地培训,而且具有竞争性的结果,将车辆费用的实际方面纳入其中。在彻底控制的实验中,我们比较了我们的方法与多处状态的新方法,以显示稳定的性能,同时使用较少的车辆,并消除一些与试验程序有关的不一致之处。