Reinforcement learning (RL) has been successfully used in various simulations and computer games. Industry-related applications, such as autonomous mobile robot motion control, are somewhat challenging for RL up to date though. This paper presents an experimental evaluation of predictive RL controllers for optimal mobile robot motion control. As a baseline for comparison, model-predictive control (MPC) is used. Two RL methods are tested: a roll-out Q-learning, which may be considered as MPC with terminal cost being a Q-function approximation, and a so-called stacked Q-learning, which in turn is like MPC with the running cost substituted for a Q-function approximation. The experimental foundation is a mobile robot with a differential drive (Robotis Turtlebot3). Experimental results showed that both RL methods beat the baseline in terms of the accumulated cost, whereas the stacked variant performed best. Provided the series of previous works on stacked Q-learning, this particular study supports the idea that MPC with a running cost adaptation inspired by Q-learning possesses potential of performance boost while retaining the nice properties of MPC.
翻译:在各种模拟和计算机游戏中成功地使用了强化学习(RL) 。 工业相关应用,例如自主移动机器人运动控制等,对迄今为止的RL来说有些挑战性。 本文为最佳移动机器人运动控制对预测RL控制器进行实验性评估。 作为比较基准,使用了模型预测控制(MPC) 。 测试了两种RL方法: 推出Q学习,可被视为MPC, 终端成本是Q功能近似, 以及所谓的堆叠式Q学习, 这又像MPC, 运行成本是Q功能近效的替代。 实验结果表明,两种RL方法在累积成本方面都比基线强, 而堆叠式变量表现最佳。 提供了堆叠式学习的先前一系列作品, 此项特别研究支持了这样一种想法,即MPC, 运行成本调整是受Q学习启发而不断调整的, 具有提高性能的潜力,同时保留MPC的优良特性。