Reinforcement learning (RL) has recently proven great success in various domains. Yet, the design of the reward function requires detailed domain expertise and tedious fine-tuning to ensure that agents are able to learn the desired behaviour. Using a sparse reward conveniently mitigates these challenges. However, the sparse reward represents a challenge on its own, often resulting in unsuccessful training of the agent. In this paper, we therefore address the sparse reward problem in RL. Our goal is to find an effective alternative to reward shaping, without using costly human demonstrations, that would also be applicable to a wide range of domains. Hence, we propose to use model predictive control~(MPC) as an experience source for training RL agents in sparse reward environments. Without the need for reward shaping, we successfully apply our approach in the field of mobile robot navigation both in simulation and real-world experiments with a Kuboki Turtlebot 2. We furthermore demonstrate great improvement over pure RL algorithms in terms of success rate as well as number of collisions and timeouts. Our experiments show that MPC as an experience source improves the agent's learning process for a given task in the case of sparse rewards.
翻译:强化学习(RL)在多个领域最近证明取得了巨大成功。然而,在设计奖励功能时,需要详细的领域专长和烦琐的微调,以确保代理人能够学习理想的行为。利用微薄的奖励来方便地减轻这些挑战。然而,微薄的奖励本身就是一个挑战,往往导致对代理人的培训失败。因此,在本文中,我们解决了RL的微薄的奖励问题。我们的目标是找到一种有效的替代奖励塑造办法,而不使用昂贵的人类演示,这也适用于广泛的领域。因此,我们提议使用模型预测控制~(MPC)作为经验的来源,在稀少的奖励环境中培训RL代理人。我们不需要塑造奖励,就成功地在模拟和现实世界实验领域运用我们的方法,在Kuboki Turtbobot 2 上,我们进一步表明在成功率以及碰撞和超时断次数方面,对纯RL算法有了很大的改进。我们的实验表明,MPC作为经验的来源,改善了代理人在微量报酬案例中的学习任务。