The Traveling Salesman Problem (TSP) is a classical NP-hard combinatorial optimization problem with many practical variants. The Lin-Kernighan-Helsgaun (LKH) algorithm is one of the state-of-the-art local search algorithms for the TSP, and LKH-3 is a powerful extension of LKH that can solve many TSP variants. Both LKH and LKH-3 associate a candidate set to each city to improve the algorithm efficiency, and have two different methods, called $\alpha$-measure and POPMUSIC, to decide the candidate sets. In this work, we first propose a Variable Strategy Reinforced LKH (VSR-LKH) algorithm, which incorporates three reinforcement learning methods (Q-learning, Sarsa, and Monte Carlo) with the LKH algorithm, for solving the TSP. We further propose a new algorithm, denoted as VSR-LKH-3, that combines the variable strategy reinforcement learning method with LKH-3 for typical TSP variants, including the TSP with time windows (TSPTW) and Colored TSP (CTSP). The proposed algorithms replace the inflexible traversal operations in LKH and LKH-3 and let the algorithms learn to make a choice at each search step by reinforcement learning. Both LKH and LKH-3, with either the $\alpha$-measure or the POPMUSIC method, can be significantly improved by our methods. Specifically, empirical results on 236 public and widely-used TSP benchmarks with up to 85,900 cities demonstrate the excellent performance of VSR-LKH, and the extended VSR-LKH-3 also significantly outperforms the state-of-the-art heuristics for TSPTW and CTSP.
翻译:旅行销售商问题( TSP) 是典型的NP- 硬组合组合优化问题, 有许多实用的变体。 Lin- Kernighan- Helsgaun (LKH) 算法是TSP 最先进的本地搜索算法之一, LKH-3 是LKH 的强大延伸, 可以解决许多 TSP变体。 LKH 和 LKH-3 将一个候选人配置到每个城市, 以提高算法效率, 并有两种不同的方法, 叫做 $\alpha$- 计量和 POPMUSIC, 来决定候选人的设置。 在这项工作中, 我们首先提出一个变量战略强化 LKH (VSR- LKH) (VSR- LK- PH) (VSR- SLH) 选项。 我们进一步提出一个新的算法, 称为 VSR- PSLH- PH 3, 将变量强化学习方法与 LKTH-3 (TSP) 典型的变式变式变式变式计算方法结合起来, 包括时间窗口( TSTPSP( TSP) 和时间窗口( TTW) 和LSAL) 快速学习的变式。