Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms guarantee optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP), the vehicle routing problem (VRP) and TSP with time windows (TSPTW) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming most other 'neural approaches' for solving TSPs, VRPs and TSPTWs with 100 nodes.
翻译:例如,最近,提出了端到端的深层次学习方法,以学习近似的解决办法,以了解这类问题的解决办法。相比之下,典型的动态编程(DP)算法保证了最佳的解决方案,但规模却与问题大小相比差强人意。我们提出了深政策动态编程(DPDP),其目的是将所学神经超常症的强项与DP算法的强项结合起来。DPDP利用由深层神经网络产生的政策,优先考虑并限制DP州空间,该政策经过培训,可以预测从示例解决方案的边缘。我们用时间窗口评估了我们关于流动销售商问题(TSP)、车辆路由问题(VRP)和TSP的框架,并表明神经政策改善了(限制的)DP算法的性能,使其与诸如LKH等强的替代法具有竞争力,同时在用100个节点解决TSP、VRPs和TPTPWs的其他“神经方法”中也比其他大多数“神经方法”好。