This paper presents a new method for solving an orienteering problem (OP) by breaking it down into two parts: a knapsack problem (KP) and a traveling salesman problem (TSP). A KP solver is responsible for picking nodes, while a TSP solver is responsible for designing the proper path and assisting the KP solver in judging constraint violations. To address constraints, we propose a dual-population coevolutionary algorithm (DPCA) as the KP solver, which simultaneously maintains both feasible and infeasible populations. A dynamic pointer network (DYPN) is introduced as the TSP solver, which takes city locations as inputs and immediately outputs a permutation of nodes. The model, which is trained by reinforcement learning, can capture both the structural and dynamic patterns of the given problem. The model can generalize to other instances with different scales and distributions. Experimental results show that the proposed algorithm can outperform conventional approaches in terms of training, inference, and generalization ability.
翻译:本文通过将其分为两个部分来提出解决一个定向问题的新方法:一个 knapsack 问题(KP) 和一个旅行销售员问题(TSP)。 KP 解答器负责选择节点,而一个 TSP 解答器负责设计正确的路径并协助KP 解答器判断违反约束规定的情况。 为了解决限制问题,我们提议了一种双重人口共变算法(DPCA)作为KP 解析器,它同时同时维持可行和不可行的人口。引入了一个动态指针网络(DYPN)作为 TSP 解答器,它将城市位置作为投入,并立即输出节点的变异。该模型通过强化学习培训,可以捕捉到给定问题的结构性和动态模式。该模型可以概括到不同规模和分布的其他实例。实验结果表明,拟议的算法在培训、推断和概括能力方面可以超越常规方法。