以分解为根据的定向问题深强化学习 (Deep Reinforcement Learning for Orienteering Problems Based on Decomposition)

This paper presents a new method for solving an orienteering problem (OP) by breaking it down into two parts: a knapsack problem (KP) and a traveling salesman problem (TSP). A KP solver is responsible for picking nodes, while a TSP solver is responsible for designing the proper path and assisting the KP solver in judging constraint violations. To address constraints, we propose a dual-population coevolutionary algorithm (DPCA) as the KP solver, which simultaneously maintains both feasible and infeasible populations. A dynamic pointer network (DYPN) is introduced as the TSP solver, which takes city locations as inputs and immediately outputs a permutation of nodes. The model, which is trained by reinforcement learning, can capture both the structural and dynamic patterns of the given problem. The model can generalize to other instances with different scales and distributions. Experimental results show that the proposed algorithm can outperform conventional approaches in terms of training, inference, and generalization ability.

翻译：本文通过将其分为两个部分来提出解决一个定向问题的新方法:一个 knapsack 问题(KP) 和一个旅行销售员问题(TSP)。 KP 解答器负责选择节点,而一个 TSP 解答器负责设计正确的路径并协助KP 解答器判断违反约束规定的情况。为了解决限制问题,我们提议了一种双重人口共变算法(DPCA)作为KP 解析器,它同时同时维持可行和不可行的人口。引入了一个动态指针网络(DYPN)作为 TSP 解答器,它将城市位置作为投入,并立即输出节点的变异。该模型通过强化学习培训,可以捕捉到给定问题的结构性和动态模式。该模型可以概括到不同规模和分布的其他实例。实验结果表明,拟议的算法在培训、推断和概括能力方面可以超越常规方法。

相关内容

深度强化学习

关注 156

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日