Deep learning in combination with improved training techniques and high computational power has led to recent advances in the field of reinforcement learning (RL) and to successful robotic RL applications such as in-hand manipulation. However, most robotic RL relies on a well known initial state distribution. In real-world tasks, this information is however often not available. For example, when disentangling waste objects the actual position of the robot w.r.t.\ the objects may not match the positions the RL policy was trained for. To solve this problem, we present a novel adversarial reinforcement learning (ARL) framework. The ARL framework utilizes an adversary, which is trained to steer the original agent, the protagonist, to challenging states. We train the protagonist and the adversary jointly to allow them to adapt to the changing policy of their opponent. We show that our method can generalize from training to test scenarios by training an end-to-end system for robot control to solve a challenging object disentangling task. Experiments with a KUKA LBR+ 7-DOF robot arm show that our approach outperforms the baseline method in disentangling when starting from different initial states than provided during training.
翻译:与经过改进的培训技术和高计算能力相结合的深层学习导致在强化学习(RL)领域和成功的机器人RL应用领域(如手动操纵)最近取得进步。然而,多数机器人RL依赖于一个众所周知的初始状态分布。在现实世界的任务中,这种信息往往无法提供。例如,当分离废物对象与机器人w.r.t.\的实际情况不匹配时,物体可能与RL政策所训练的状态不匹配。为了解决这个问题,我们提出了一个新的对抗性强化学习(ARL)框架。ARL框架使用一个对手,该对手受过训练,可以引导原始代理人,即主角,挑战国家。我们共同培训主角和对手,使他们能够适应对手不断变化的政策。我们表明,我们的方法可以从培训到测试假设情景,通过培训最终到终端的机器人控制系统,解决具有挑战性的物体断层任务。我们用KUKA LBR+ 7-DF机器人手臂进行的实验表明,我们的方法在从最初的状态开始培训期间超越了基线。