The multiple traveling salesman problem (mTSP) is a well-known NP-hard problem with numerous real-world applications. In particular, this work addresses MinMax mTSP, where the objective is to minimize the max tour length among all agents. Many robotic deployments require recomputing potentially large mTSP instances frequently, making the natural trade-off between computing time and solution quality of great importance. However, exact and heuristic algorithms become inefficient as the number of cities increases, due to their computational complexity. Encouraged by the recent developments in deep reinforcement learning (dRL), this work approaches the mTSP as a cooperative task and introduces DAN, a decentralized attention-based neural method that aims at tackling this key trade-off. In DAN, agents learn fully decentralized policies to collaboratively construct a tour, by predicting each other's future decisions. Our model relies on the Transformer architecture and is trained using multi-agent RL with parameter sharing, providing natural scalability to the numbers of agents and cities. Our experimental results on small- to large-scale mTSP instances ($50$ to $1000$ cities and $5$ to $20$ agents) show that DAN is able to match or outperform state-of-the-art solvers while keeping planning times low. In particular, given the same computation time budget, DAN outperforms all conventional and dRL-based baselines on larger-scale instances (more than 100 cities, more than 5 agents), and exhibits enhanced agent collaboration. A video explaining our approach and presenting our results is available at \url{https://youtu.be/xi3cLsDsLvs}.
翻译:多重旅行销售商问题( MTSP) 是一个众所周知的NP-硬性问题,涉及众多真实世界应用程序。 特别是, 这项工作针对MinMax MISTP, 目标是最大限度地减少所有代理商的最大巡航长度。 许多机器人部署需要经常重新计算潜在的大型 MTSP 案例, 使得计算时间和解决方案质量之间的自然权衡变得非常重要。 然而, 精确和超常的算法随着城市数量的计算复杂性的增加而变得效率低下。 受到深层强化学习( dRL) 的最新发展鼓励。 这项工作将 MTSP 作为一种合作任务, 引入了基于分散关注的神经方法, 旨在解决所有代理商的最大巡航距离。 在丹州, 代理商通过预测对方的未来决定, 完全分散的政策可以合作构建一个巡航旅行。 我们的模型依赖于变换式结构, 并经过培训, 使用多代理商RL 共享参数, 向代理商和城市提供自然的可扩展性。 我们关于小型至大型MTSP的实验结果, 引入一个合作任务( 500美元) 分散的神经基数据, 显示我们最高级的代理商为50美元, 显示一个稳定的城市, 常规代理商为20美元, 显示一个稳定的代理商, 显示一个稳定的市, 相同的预算周期为20格式, 。