Recently, there is an emerging trend to apply deep reinforcement learning to solve the vehicle routing problem (VRP), where a learnt policy governs the selection of next node for visiting. However, existing methods could not handle well the pairing and precedence relationships in the pickup and delivery problem (PDP), which is a representative variant of VRP. To address this challenging issue, we leverage a novel neural network integrated with a heterogeneous attention mechanism to empower the policy in deep reinforcement learning to automatically select the nodes. In particular, the heterogeneous attention mechanism specifically prescribes attentions for each role of the nodes while taking into account the precedence constraint, i.e., the pickup node must precede the pairing delivery node. Further integrated with a masking scheme, the learnt policy is expected to find higher-quality solutions for solving PDP. Extensive experimental results show that our method outperforms the state-of-the-art heuristic and deep learning model, respectively, and generalizes well to different distributions and problem sizes.
翻译:最近出现了一种新趋势,即运用深度强化学习来解决车辆路由问题(VRP),在选择访问的下一个节点时要遵循一项已学习的政策;然而,现有方法无法很好地处理接送问题(PDP)中的配对和优先关系(PDP),这是VRP的一个有代表性的变体。为了解决这个具有挑战性的问题,我们利用一个与不同关注机制相结合的新神经网络,使政策在深度强化学习中能够自动选择节点。特别是,混合关注机制具体规定了节点的每个作用的注意,同时考虑到优先限制,即配送节点之前必须先有接合节点。进一步与遮罩计划相结合,预期所学的政策将找到更高质量的解决方案来解决PDP。广泛的实验结果显示,我们的方法分别超越了最先进的超常和深层学习模式,并概括了不同的分布和问题大小。