Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy. However, existing learning-based solvers often struggle with generalization when faced with changes in problem distributions and scales. In this paper, we propose a new approach called ASP: Adaptive Staircase Policy Space Response Oracle to address these generalization issues and learn a universal neural solver. ASP consists of two components: Distributional Exploration, which enhances the solver's ability to handle unknown distributions using Policy Space Response Oracles, and Persistent Scale Adaption, which improves scalability through curriculum learning. We have tested ASP on several challenging COPs, including the traveling salesman problem, the vehicle routing problem, and the prize collecting TSP, as well as the real-world instances from TSPLib and CVRPLib. Our results show that even with the same model size and weak training signal, ASP can help neural solvers explore and adapt to unseen distributions and varying scales, achieving superior performance. In particular, compared with the same neural solvers under a standard training pipeline, ASP produces a remarkable decrease in terms of the optimality gap with 90.9% and 47.43% on generated instances and real-world instances for TSP, and a decrease of 19% and 45.57% for CVRP.
翻译:应用机器学习来分类优化问题,具有提高效率和准确性的潜力。然而,现有的学习型解决方案在面临问题分布和规模变化时,常常与一般化斗争。在本文中,我们提议了一个新的方法,名为ASP:适应性楼梯政策空间应对空间孔径,以解决这些一般化问题,并学习一个普遍的神经求解器。ASP由两个部分组成:分配式探索,它增强了求解者利用政策空间反应神器处理未知分布的能力,以及持续规模适应,通过课程学习提高了可扩缩性。我们已经在几个具有挑战性的COP上测试了ASP,包括旅行销售人员问题、车辆路由问题、收集TSP的奖项,以及TSPLib和CVRPLib的真实世界实例。我们的结果表明,即使使用同样的模型大小和薄弱的培训信号,ASP也能帮助求解神经求解者探索和适应看不见的分布和不同规模,从而实现优异性表现。与标准培训管道下相同的神经求解器相比,我们测试了ASP:4557%实际和真实下降,19 %。</s>