Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination -- a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose a hybrid model that uses an attention encoder and a Long Short-Term Memory (LSTM) network decoder, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for the coordinated routing of multiple vehicles than the attention-based model. The proposed model demonstrates comparable results as the operations research baseline methods.
翻译:强化学习最近在许多组合优化问题中显示出学习质量解决方案的希望,特别是,基于关注的编码器-编码器模型显示,在包括旅行销售员问题(TSP)在内的各种路由问题上,在包括旅行销售员问题(TSP)在内的各种路由问题上,它们表现不佳。不幸的是,对于TSP来说,它们与无人机(TSP-D)相比表现不佳,这需要协调不同的车队 -- -- 一辆卡车和无人驾驶飞机。在TSP-D中,两部车辆正在同步移动,可能需要等待另一部车辆加入的节点。基于关注的编码器解码器无法在车辆之间进行协调。我们提出的混合模型使用注意编码器和长期短期内存(LSTM)网络解码器,其中解码器的隐藏状态可以代表所采取行动的顺序。我们的经验证明,这种混合模型在纯粹基于关注的解决方案质量和计算效率模式的基础上改进了这种混合模式。我们在微积分立式车辆问题(mmCVRP)上进行的实验还证实,混合模型比拟的模型更适合协调的车辆基线研究方法。