Robotic manipulation stands as a largely unsolved problem despite significant advances in robotics and machine learning in recent years. One of the key challenges in manipulation is the exploration of the dynamics of the environment when there is continuous contact between the objects being manipulated. This paper proposes a model-based active exploration approach that enables efficient learning in sparse-reward robotic manipulation tasks. The proposed method estimates an information gain objective using an ensemble of probabilistic models and deploys model predictive control (MPC) to plan actions online that maximize the expected reward while also performing directed exploration. We evaluate our proposed algorithm in simulation and on a real robot, trained from scratch with our method, on a challenging ball pushing task on tilted tables, where the target ball position is not known to the agent a-priori. Our real-world robot experiment serves as a fundamental application of active exploration in model-based reinforcement learning of complex robotic manipulation tasks.
翻译:尽管近年来机器人和机器学习取得了显著进展,但机器人操纵基本上是一个尚未解决的问题。操作方面的主要挑战之一是在被操纵物体之间不断接触时探索环境动态。本文件提出一种基于模型的积极探索方法,以便能够在稀有回报的机器人操作任务中高效学习。拟议方法估计信息获取目标,使用各种概率模型的组合,并部署模型预测控制(MPC),在网上规划行动,在进行定向探索的同时,最大限度地获得预期的奖励。我们评估了模拟和真实机器人的拟议算法,用我们的方法从零开始训练,在倾斜的桌子上开展具有挑战性的推球任务,使目标球位置不被代理人所知。我们真实世界的机器人实验是积极探索基于模型的强化学习复杂的机器人操纵任务的基本应用。