Vehicle routing problems and other combinatorial optimization problems have been approximately solved by reinforcement learning agents with policies based on encoder-decoder models with attention mechanisms. These techniques are of substantial interest but still cannot solve the complex routing problems that arise in a realistic setting which can have many trucks and complex requirements. With the aim of making reinforcement learning a viable technique for supply chain optimization, we develop new extensions to encoder-decoder models for vehicle routing that allow for complex supply chains using classical computing today and quantum computing in the future. We make two major generalizations. First, our model allows for routing problems with multiple trucks. Second, we move away from the simple requirement of having a truck deliver items from nodes to one special depot node, and instead allow for a complex tensor demand structure. We show how our model, even if trained only for a small number of trucks, can be embedded into a large supply chain to yield viable solutions.
翻译:车辆路由问题和其他组合优化问题基本上通过强化学习代理来解决,其政策以编码器-编码器模型为基础,并配有关注机制。这些技术引起了极大的兴趣,但仍然无法解决现实环境下出现的复杂路线问题,因为现实环境中会产生许多卡车和复杂要求。为了使强化学习成为供应链优化的可行技术,我们开发了车辆路线编码编码器-编码器新模式的扩展,允许使用古典计算和量子计算进行复杂的供应链。我们做了两大概括。首先,我们的模式允许多卡车的路线问题。第二,我们不再要求卡车从节点运送物品到一个特殊仓库节点,而是允许复杂的高压需求结构。我们展示了我们的模式,即使只受过少量卡车培训,如何被嵌入一个大型供应链,以产生可行的解决方案。