Training agents to autonomously learn how to use anthropomorphic robotic hands has the potential to lead to systems capable of performing a multitude of complex manipulation tasks in unstructured and uncertain environments. In this work, we first introduce a suite of challenging simulated manipulation tasks that current reinforcement learning and trajectory optimisation techniques find difficult. These include environments where two simulated hands have to pass or throw objects between each other, as well as an environment where the agent must learn to spin a long pen between its fingers. We then introduce a simple trajectory optimisation that performs significantly better than existing methods on these environments. Finally, on the challenging PenSpin task we combine sub-optimal demonstrations generated through trajectory optimisation with off-policy reinforcement learning, obtaining performance that far exceeds either of these approaches individually, effectively solving the environment. Videos of all of our results are available at: https://dexterous-manipulation.github.io/
翻译:培训人员自主学习如何使用人类形态机器人手,这有可能导致系统能够在没有结构和不确定的环境中执行许多复杂的操作任务。 在这项工作中,我们首先推出一系列具有挑战性的模拟操作任务,目前强化学习和轨迹优化技术发现难以完成。其中包括两个模拟手必须相互通过或投掷物体的环境,以及代理人必须学会在手指之间旋转长笔的环境。然后我们引入一个简单的轨道优化,该轨道优化比在这些环境中的现有方法要好得多。最后,关于挑战性的PenSpin任务,我们通过轨迹优化产生的次级优化演示与非政策强化学习相结合,获得的绩效远远超过这两种方法中的任何一个,有效地解决了环境。我们所有结果的视频可见于:https://dexterous-manipulate.github.io/https://dexterous-manipulation.github.