Recently, deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization (CO) problems. However, most DRL solvers can only scale to a few hundreds of nodes for combinatorial optimization problems on graphs, such as the Traveling Salesman Problem (TSP). This paper addresses the scalability challenge in large-scale combinatorial optimization by proposing a novel approach, namely, DIMES. Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions. Such a continuous space allows stable REINFORCE-based training and fine-tuning via massively parallel sampling. We further propose a meta-learning framework to enable the effective initialization of model parameters in the fine-tuning stage. Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.
翻译:最近,深入强化学习模式(DRL)在解决NP-硬组合优化(CO)问题方面取得了令人乐观的成果,然而,大多数DRL解决方案解答者只能在图表(如旅行销售商问题)中将组合优化问题缩小成数百个节点。本文件通过提出新的方法(即DIMES)来解决大规模组合优化的可缩放性挑战。与以往的DRL方法不同,后者在离散解决方案的自动递解或迭接性完善方面遭遇了昂贵的自动递解或迭接问题,DIMES为参数化候选解决方案的基本分布引入了一个紧凑的连续空间。这样的连续空间使得基于REINFORCE的培训和通过大规模平行抽样进行微调得以稳定地进行REINFORCE的培训和微调。我们进一步提议了一个元学习框架,以便在微调阶段有效初始化模型参数。广泛的实验显示,DIMES超越了最近基于DL的关于旅行销售商问题和最大独立设置问题大型基准数据集的方法。