Despite the success of Neural Combinatorial Optimization methods for end-to-end heuristic learning, out-of-distribution generalization remains a challenge. In this paper, we present a novel formulation of combinatorial optimization (CO) problems as Markov Decision Processes (MDPs) that effectively leverages symmetries of the CO problems to improve out-of-distribution robustness. Starting from the standard MDP formulation of constructive heuristics, we introduce a generic transformation based on bisimulation quotienting (BQ) in MDPs. This transformation allows to reduce the state space by accounting for the intrinsic symmetries of the CO problem and facilitates the MDP solving. We illustrate our approach on the Traveling Salesman, Capacitated Vehicle Routing and Knapsack Problems. We present a BQ reformulation of these problems and introduce a simple attention-based policy network that we train by imitation of (near) optimal solutions for small instances from a single distribution. We obtain new state-of-the-art generalization results for instances with up to 1000 nodes from synthetic and realistic benchmarks that vary both in size and node distributions.
翻译:尽管在最终到最终超常学的神经组合组合优化方法方面取得了成功,但是,在分配外的普及化方面仍然存在挑战。在本文件中,我们以Markov决策程序(MDPs)提出组合优化(CO)问题的新提法,有效地利用CO问题的对称性来提高分配的稳健性。从标准MDP的建设性超常主义制定开始,我们引入了基于多边发展方案中平衡商价(BQ)的通用转换。这种转变通过计算CO问题内在的对称性,可以缩小国家空间,并促进MDP的解决。我们介绍了我们对旅行推销员、卡帕齐特车辆朗普和Knappsack问题的做法。我们介绍了这些问题的BQ重新组合,并引入一个简单关注的政策网络,我们通过模仿单一分布的小型案例(近于)最佳解决方案来培训。我们获得了新的州通用化结果,从合成和现实的分布基准到1 000个不同大小和不相异的合成和不相容和不相容不相容的节制标准。