Deep reinforcement learning (RL) has proved to be a competitive heuristic for solving small-sized instances of traveling salesman problems (TSP), but its performance on larger-sized instances is insufficient. Since training on large instances is impractical, we design a novel deep RL approach with a focus on generalizability. Our proposition consisting of a simple deep learning architecture that learns with novel RL training techniques, exploits two main ideas. First, we exploit equivariance to facilitate training. Second, we interleave efficient local search heuristics with the usual RL training to smooth the value landscape. In order to validate the whole approach, we empirically evaluate our proposition on random and realistic TSP problems against relevant state-of-the-art deep RL methods. Moreover, we present an ablation study to understand the contribution of each of its component
翻译:深层强化学习(RL)已证明是解决小型旅行推销员问题(TSP)的一个具有竞争力的累赘,但规模较大的推销员问题(TSP)的绩效是不够的。由于大规模培训不切实际,我们设计了一个新的深层RL方法,重点是普遍性。我们的建议包括一个简单的深层学习结构,它利用新的RL培训技术学习,利用两个主要想法。首先,我们利用差异性来便利培训。第二,我们把高效的本地搜索超额与通常的RP培训结合起来,以平滑价值景观。为了验证整个方法,我们用经验评估了我们关于随机和现实的TSP问题的观点,而不是相关的最先进的深层RL方法。此外,我们提出了一个模拟研究,以了解其每个组成部分的贡献。