Deep reinforcement learning (DRL)-based combinatorial optimization (CO) methods (i.e., DRL-NCO) have shown significant merit over the conventional CO solvers as DRL-NCO is capable of learning CO solvers less relying on problem-specific expert domain knowledge (heuristic method) and supervised labeled data (supervised learning method). This paper presents a novel training scheme, Sym-NCO, which is a regularizer-based training scheme that leverages universal symmetricities in various CO problems and solutions. Leveraging symmetricities such as rotational and reflectional invariance can greatly improve the generalization capability of DRL-NCO because it allows the learned solver to exploit the commonly shared symmetricities in the same CO problem class. Our experimental results verify that our Sym-NCO greatly improves the performance of DRL-NCO methods in four CO tasks, including the traveling salesman problem (TSP), capacitated vehicle routing problem (CVRP), prize collecting TSP (PCTSP), and orienteering problem (OP), without utilizing problem-specific expert domain knowledge. Remarkably, Sym-NCO outperformed not only the existing DRL-NCO methods but also a competitive conventional solver, the iterative local search (ILS), in PCTSP at 240 faster speed. Our source code is available at https://github.com/alstn12088/Sym-NCO.
翻译:深度强化学习(DRL)基于深度强化学习(DRL)的组合优化(CO)方法(即DRL-NCO)对常规CO解决问题者有显著的优点,因为DRL-NCO能够学习CO解决者的通用能力,因为DRL-NCO能够学习不那么依赖特定问题的专家领域知识(重力方法)和监督标签数据(监督学习方法)。本文展示了一个创新的培训计划,Sym-NCO是一个基于常规的培训计划,它利用了各种CO问题和解决办法的普遍对应性(即,DRL-NCO)方法。将类似轮换和反反反反应的对称性(CVRization symall)等对等对称性可以大大提高DRL-NCO的通用能力,因为它使学习的解决者能够利用在同一CO问题类中共同的常规对称性知识(Hym-NCO),我们的Symel-NCO方法在四种CO任务中,包括旅行销售人员问题(TSP),变速的车辆运行问题(CRyLRULRiver Sy-Sy-Sy-Syal SyLARIG)中,也使用不具有特定的可辨识解方法。