Reinforcement learning has been applied in operation research and has shown promise in solving large combinatorial optimization problems. However, existing works focus on developing neural network architectures for certain problems. These works lack the flexibility to incorporate recent advances in reinforcement learning, as well as the flexibility of customizing model architectures for operation research problems. In this work, we analyze the end-to-end autoregressive models for vehicle routing problems and show that these models can benefit from the recent advances in reinforcement learning with a careful re-implementation of the model architecture. In particular, we re-implemented the Attention Model and trained it with Proximal Policy Optimization (PPO) in CleanRL, showing at least 8 times speed up in training time. We hereby introduce RLOR, a flexible framework for Deep Reinforcement Learning for Operation Research. We believe that a flexible framework is key to developing deep reinforcement learning models for operation research problems. The code of our work is publicly available at https://github.com/cpwan/RLOR.
翻译:强化学习已被应用于运营研究,并表现出在解决大型组合优化问题方面的潜力。然而,现有的研究集中于为特定问题开发神经网络架构。但是,这些工作缺乏灵活性,无法整合强化学习的最新进展,也无法灵活定制用于运营研究的模型架构。在本文中,我们分析了汽车路径规划问题的端到端自回归模型,并展示这些模型可以从对模型架构的小心重新实现中受益于强化学习的最新进展。具体而言,我们重新实现了注意力模型,并在CleanRL中使用邻近策略优化(PPO)进行训练,显示了至少8倍的训练时间加速。我们在此介绍RLOR,这是一个用于运营研究的深度强化学习的灵活框架。我们认为,灵活的框架是开发深度强化学习模型的关键,可用于运营研究问题。我们的工作的代码公开可用于https://github.com/cpwan/RLOR。