In this paper, we shed new light on the generalization ability of deep learning-based solvers for Traveling Salesman Problems (TSP). Specifically, we introduce a two-player zero-sum framework between a trainable \emph{Solver} and a \emph{Data Generator}, where the Solver aims to solve the task instances provided by the Generator, and the Generator aims to generate increasingly difficult instances for improving the Solver. Grounded in \textsl{Policy Space Response Oracle} (PSRO) methods, our two-player framework outputs a population of best-responding Solvers, over which we can mix and output a combined model that achieves the least exploitability against the Generator, and thereby the most generalizable performance on different TSP tasks. We conduct experiments on a variety of TSP instances with different types and sizes. Results suggest that our Solvers achieve the state-of-the-art performance even on tasks the Solver never meets, whilst the performance of other deep learning-based Solvers drops sharply due to over-fitting. On real-world instances from \textsc{TSPLib}, our method also attains a \textbf{12\%} improvement, in terms of optimal gap, over the best baseline model. To demonstrate the principle of our framework, we study the learning outcome of the proposed two-player game and demonstrate that the exploitability of the Solver population decreases during training, and it eventually approximates the Nash equilibrium along with the Generator.
翻译:在本文中,我们重新展示了深层次学习型的销售员问题销售员问题(TSP)解决者深层次学习基础的游戏解决者的一般能力。 具体地说,我们引入了两个玩家的零和框架框架,在可训练的 emph{Solver} 和 emph{Data Ganger} 之间,在这两个框架中,Solfer 的目的是解决发电机提供的任务实例,而发电机的目的是产生日益困难的改进溶剂案例。我们的两个玩家框架基于\ textsl{ 政策空间反应Oracle} (PSRO) 方法,产生了一个反应最准确的溶剂群。我们可以混合并产生一个对发电机最不易利用的混合模型,从而在不同的TSP任务中取得最普遍的业绩。 我们的溶剂尝试了不同类型和大小的各种TSP案例。 结果表明,我们的溶剂甚至在溶剂的任务上达到了最先进的业绩,而其他深层次的学习型的溶剂的磨损率则由于差距的过大而下降。 在现实世界中,我们最接近的研究中,我们最接近地展示了我们最接近的结果是, 。