Deep reinforcement learning (DRL)-based combinatorial optimization (CO) methods (i.e., DRL-NCO) have shown significant merit over the conventional CO solvers as DRL-NCO is capable of learning CO solvers without supervised labels attained from the verified solver. This paper presents a novel training scheme, Sym-NCO, that achieves significant performance increments to existing DRL-NCO methods. Sym-NCO is a regularizer-based training scheme that leverages universal symmetricities in various CO problems and solutions. Imposing symmetricities such as rotational and reflectional invariance can greatly improve generalization capability of DRL-NCO as symmetricities are invariant features shared by certain CO tasks. Our experimental results verify that our Sym-NCO greatly improves the performance of DRL-NCO methods in four CO tasks, including traveling salesman problem (TSP), capacitated vehicle routing problem (CVRP), prize collecting TSP (PCTSP), and orienteering problem (OP), without employing problem-specific techniques. Remarkably, Sym-NCO outperformed not only the existing DRL-NCO methods but also a competitive conventional solver, the iterative local search (ILS), in PCTSP at 240 times faster speed.
翻译:深度强化学习(DRL)基于深度强化学习(DRL-NCO)的组合优化(CO)方法(即DRL-NCO)对常规CO的解决方案有显著的优点,因为DRL-NCO能够学习CO的解决方案,而没有经过核查的解决者的监管标签。本文提出了一个创新的培训计划,Sym-NCO,在现有的DRL-NCO方法上实现了显著的性能增量。Sym-NCO是一个基于常规的培训计划,利用了各种CO问题和解决办法的普遍对称性(即DRL-NCO),如轮换和反反思考差异等对称性可以大大提高DRL-NCO的通用能力,因为对称性是某些CO的任务所共有的。我们的实验结果证实,我们的Sym-NCO在四种CO任务中极大地改进了DRL-NCO方法的性能,包括旅行销售员问题(TSP)、增强机动车辆路由问题(CVRP)、收集TRSP(PC)奖项,以及将速度问题升级问题升级的问题(OP-NCOPS-Rientering)和速度问题(OP-SDRVIL),在不具有竞争性的SVILRVILRIL)中,在不具有竞争性的常规方法上再研究。