Multi-step manipulation tasks in unstructured environments are extremely challenging for a robot to learn. Such tasks interlace high-level reasoning that consists of the expected states that can be attained to achieve an overall task and low-level reasoning that decides what actions will yield these states. We propose a model-free deep reinforcement learning method to learn multi-step manipulation tasks. We introduce a Robotic Manipulation Network (RoManNet), which is a vision-based model architecture, to learn the action-value functions and predict manipulation action candidates. We define a Task Progress based Gaussian (TPG) reward function that computes the reward based on actions that lead to successful motion primitives and progress towards the overall task goal. To balance the ratio of exploration/exploitation, we introduce a Loss Adjusted Exploration (LAE) policy that determines actions from the action candidates according to the Boltzmann distribution of loss estimates. We demonstrate the effectiveness of our approach by training RoManNet to learn several challenging multi-step robotic manipulation tasks in both simulation and real-world. Experimental results show that our method outperforms the existing methods and achieves state-of-the-art performance in terms of success rate and action efficiency. The ablation studies show that TPG and LAE are especially beneficial for tasks like multiple block stacking. Code is available at: https://github.com/skumra/romannet
翻译:在非结构化环境中,多步操纵任务对机器人来说极具挑战性。这种任务包含高山任务进度(TPG)奖赏功能,该奖赏功能根据导致运动成功、原始和朝向总任务目标进展的行动计算得来。为了平衡勘探/开发的比例,我们提出了一个“损失调整探索”(LAE)政策,根据博尔茨曼的损失估计分配情况,决定行动候选人的行动行动。我们通过培训罗曼网络,在模拟和实际世界中学习若干具有挑战性的多步机器人操纵任务,展示了我们的方法的有效性。实验结果显示,我们的方法超越了现有方法,并且实现了T-PG/RO的多重成功率。