Traveling Salesman Problem (TSP), as a classic routing optimization problem originally arising in the domain of transportation and logistics, has become a critical task in broader domains, such as manufacturing and biology. Recently, Deep Reinforcement Learning (DRL) has been increasingly employed to solve TSP due to its high inference efficiency. Nevertheless, most of existing end-to-end DRL algorithms only perform well on small TSP instances and can hardly generalize to large scale because of the drastically soaring memory consumption and computation time along with the enlarging problem scale. In this paper, we propose a novel end-to-end DRL approach, referred to as Pointerformer, based on multi-pointer Transformer. Particularly, Pointerformer adopts both reversible residual network in the encoder and multi-pointer network in the decoder to effectively contain memory consumption of the encoder-decoder architecture. To further improve the performance of TSP solutions, Pointerformer employs both a feature augmentation method to explore the symmetries of TSP at both training and inference stages as well as an enhanced context embedding approach to include more comprehensive context information in the query. Extensive experiments on a randomly generated benchmark and a public benchmark have shown that, while achieving comparative results on most small-scale TSP instances as SOTA DRL approaches do, Pointerformer can also well generalize to large-scale TSPs.
翻译:旅行商问题(TSP)是一种经典的优化路由问题,最初起源于运输和物流领域,现在已成为制造和生物学等广泛领域中的关键任务。最近,由于其高推理效率,深度强化学习(DRL)越来越多地被用于解决TSP。然而,大部分现有的端到端DRL算法只在较小的TSP实例上表现良好,并且随着问题规模的扩大,其内存消耗和计算时间大幅增加,很难推广到大规模TSP。本文提出了一种基于多指针Transformer的全新端到端DRL方法,称为Pointerformer。特别地,Pointerformer在编码器中采用可逆的残差网络,而在解码器中采用多指针网络,以有效地控制编码器-解码器架构的内存消耗。为了进一步提高TSP解决方案的性能,Pointerformer采用了一种特征增强方法来在训练和推理阶段探索TSP的对称性,以及一种改进的上下文嵌入方法,以在查询中包含更全面的上下文信息。在随机生成的基准测试和公共基准测试上进行的广泛实验表明,Pointerformer在大多数小规模TSP实例上实现了与SOTA DRL方法相当的结果,同时也能很好地推广到大规模TSP上。