We address the Traveling Salesman Problem (TSP), a famous NP-hard combinatorial optimization problem. And we propose a variable strategy reinforced approach, denoted as VSR-LKH, which combines three reinforcement learning methods (Q-learning, Sarsa and Monte Carlo) with the well-known TSP algorithm, called Lin-Kernighan-Helsgaun (LKH). VSR-LKH replaces the inflexible traversal operation in LKH, and lets the program learn to make choice at each search step by reinforcement learning. Experimental results on 111 TSP benchmarks from the TSPLIB with up to 85,900 cities demonstrate the excellent performance of the proposed method.
翻译:我们处理旅行推销员问题(TSP),这是一个著名的NP硬型组合优化问题。我们提出了一个可变战略强化方法,称为VSR-LKH,将三种强化学习方法(Q-learning, Sarsa和Monte Carlo)与众所周知的TSP算法(Lin-Kernighan-Helsgaun (LKH))相结合。 VSR-LKH取代了LKH的不灵活轮行操作,让方案通过强化学习在每一个搜索步骤中做出选择。 TSPLIB的111个TSP基准的实验结果显示了拟议方法的出色表现。