"什么,不是怎么" - 解决从头到尾未充分激活的插入任务 ("What, not how" -- Solving an under-actuated insertion task from scratch)

Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as grasping an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete manipulation ('what' to solve) is specified. In contrast, we study a complex manipulation goal that requires an agent to learn and combine diverse manipulation skills. We propose a challenging, highly under-actuated peg-in-hole task with a free, rotational asymmetrical peg, requiring a broad range of manipulation skills. While correct peg (re-)orientation is a requirement for successful insertion, there is no reward associated with it. Hence an agent needs to understand this pre-condition and learn the skill to fulfil it. The final insertion reward is sparse, allowing freedom in the solution and leading to complex emerging behaviour not envisioned during the task design. We tackle the problem in a multi-task RL framework using Scheduled Auxiliary Control (SAC-X) combined with Regularized Hierarchical Policy Optimization (RHPO) which successfully solves the task in simulation and from scratch on a single robot where data is severely limited.

翻译：机器人操作需要一套复杂的技能, 需要仔细结合和协调才能解决任务。然而, 在机器人研究任务中, 多数强化学习( RL) 方法实际上只包含一个单一的操作技能, 如抓取一个对象或插入一个预切对象。因此, 技术( 如何解决任务) 需要指定, 而不是完全操控( “ 要解决什么” ) 的实际目标。相反, 我们研究一个复杂的操作目标, 需要一名代理人学习和结合多种操作技能。我们提出一个具有挑战性的、高度低活化的连接孔( REL) 方法, 需要一个自由的、旋转的对称对称的螺旋操纵。虽然正确的 peg( 重新) 方向是成功插入的一个要求, 但是没有与此相关的奖励。因此, 代理需要理解这个预设条件并学习完成它的技能。最后插入奖励是稀少的, 允许解决方案中的自由, 并导致任务设计期间无法预见到的复杂行为。我们建议用一个多功能的 RGL 框架来解决问题, 需要广泛的操作技巧, 需要广泛的操作技能。在 IMU- IMU- AS- AS- AS- AS- IM- IM- IM- IM- IM- 中, AS- AS- AS- IM- IM- AS- IM- IM- IM- IM- AS- IM- AS- AS- AS- AS- AS- AS- AS- AS- AS- AS- AS- IC- IC- IC- IC- IC- IC- 成功共共共制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制