Learning robot manipulation through deep reinforcement learning in environments with sparse rewards is a challenging task. In this paper we address this problem by introducing a notion of imaginary object goals. For a given manipulation task, the object of interest is first trained to reach a desired target position on its own, without being manipulated, through physically realistic simulations. The object policy is then leveraged to build a predictive model of plausible object trajectories providing the robot with a curriculum of incrementally more difficult object goals to reach during training. The proposed algorithm, Follow the Object (FO), has been evaluated on 7 MuJoCo environments requiring increasing degree of exploration, and has achieved higher success rates compared to alternative algorithms. In particularly challenging learning scenarios, e.g. where the object's initial and target positions are far apart, our approach can still learn a policy whereas competing methods currently fail.
翻译:通过在少有回报的环境中深强化学习机器人操作来学习机器人操作是一项艰巨的任务。 在本文件中,我们通过引入想象对象目标的概念来解决这个问题。 对于特定的操作任务,感兴趣的对象首先经过训练,在没有实际操作的情况下,通过物理上现实的模拟,自己达到理想目标位置。然后,利用对象政策来建立一个可信的物体轨迹的预测模型,为机器人提供在培训期间可以达到的越来越困难的物体目标课程。 拟议的算法“跟随对象(FO)”已经根据7 MuJoCo环境进行了评估,需要更高程度的探索,并且取得了高于替代算法的成功率。 在特别具有挑战性的学习情景中,例如,在对象的初始位置和目标位置相去甚远的情况下,我们的方法仍然可以学习政策,而目前竞争的方法却失败了。