The successful operation of mobile robots requires them to rapidly adapt to environmental changes. Toward developing an adaptive decision-making tool for mobile robots, we propose combining meta-reinforcement learning (meta-RL) with model predictive control (MPC). The key idea of our method is to switch between a meta-learned policy and an MPC controller in an event-triggered fashion. Our method uses an off-policy meta-RL algorithm as a baseline to train a policy using transition samples generated by MPC. The MPC module of our algorithm is carefully designed to infer the movements of obstacles via Gaussian process regression (GPR) and to avoid collisions via conditional value-at-risk (CVaR) constraints. Due to its design, our method benefits from the two complementary tools. First, high-performance action samples generated by the MPC controller enhance the learning performance and stability of the meta-RL algorithm. Second, through the use of the meta-learned policy, the MPC controller is infrequently activated, thereby significantly reducing computation time. The results of our simulations on a restaurant service robot show that our algorithm outperforms both of the baseline methods.
翻译:移动机器人的成功运行要求它们迅速适应环境变化。为了开发移动机器人的适应性决策工具,我们提议将元强化学习(meta-RL)与模型预测控制(MPC)相结合。我们方法的关键理念是,以事件触发的方式,在元学习政策和MPC控制器之间转换。我们的方法使用一种政策外元RL算法作为基准,用MPC生成的过渡样本来培训政策。我们算法的MPC模块经过仔细设计,以推断障碍通过Gaussian进程回归(GPR)的移动,并通过有条件的值风险(CVaR)限制避免碰撞。由于其设计,我们的方法得益于两种辅助工具。首先,MPC控制器产生的高性动作样本提高了Met-RL算法的学习性能和稳定性。第二,通过使用元学习政策,MPC控制器不经常被激活,从而大大缩短了计算时间。我们在一家餐馆服务机器人的模拟结果显示,我们两个基线方法都使用了算法。