The vehicle routing problem is a well known class of NP-hard combinatorial optimisation problems in literature. Traditional solution methods involve either carefully designed heuristics, or time-consuming metaheuristics. Recent work in reinforcement learning has been a promising alternative approach, but has found it difficult to compete with traditional methods in terms of solution quality. This paper proposes a hybrid approach that combines reinforcement learning, policy rollouts, and a satisfiability solver to enable a tunable tradeoff between computation times and solution quality. Results on a popular public data set show that the algorithm is able to produce solutions closer to optimal levels than existing learning based approaches, and with shorter computation times than meta-heuristics. The approach requires minimal design effort and is able to solve unseen problems of arbitrary scale without additional training. Furthermore, the methodology is generalisable to other combinatorial optimisation problems.
翻译:车辆路线问题是文献中众所周知的NP硬组合式优化问题。传统解决方案方法要么涉及精心设计的超光速学,要么涉及耗时的计量经济学。加强学习最近的工作是一种有希望的替代方法,但发现很难在解决方案质量方面与传统方法竞争。本文件建议采用混合方法,将强化学习、政策推出和可视性解决方案结合起来,以便在计算时间和解决方案质量之间实现金枪鱼分量的平衡。流行公共数据集的结果显示,算法能够产生比现有基于学习的方法更接近最佳水平的解决方案,计算时间比超常法要短。这一方法需要最低限度的设计努力,并且能够在没有额外培训的情况下解决任意规模的无形问题。此外,该方法对于其他组合式优化问题是普遍的。