Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination -- a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose an attention encoder-LSTM decoder hybrid model, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for coordinated routing of multiple vehicles than the attention-based model.
翻译:强化学习最近在许多组合优化问题中显示出学习质量解决方案的希望,特别是,基于注意的编码器-编码器模式在包括旅行销售员问题(TSP)在内的各种路由问题上显示出很高的效力。不幸的是,它们与无人驾驶飞机(TSP-D)相比,对TSP来说表现不佳,需要协调不同的车队 -- -- 卡车和无人驾驶飞机。在TSP-D中,两部车辆正在同步移动,可能需要等待另一部车辆加入节点。基于注意的编码器解码器无法在车辆之间进行协调。我们建议注意编码器-LSTM解码器混合模型,其中解码器的隐藏状态可以代表所采取行动的顺序。我们的经验证明,这种混合模型改进了纯粹基于注意的解决方案质量和计算效率模式。我们还证实,在微积积聚式车辆冲浪问题(mmCVRP)上进行的实验还证实,混合模型比关注模型更适合多部车辆的协调路线。