Intelligent agents must be able to think fast and slow to perform elaborate manipulation tasks. Reinforcement Learning (RL) has led to many promising results on a range of challenging decision-making tasks. However, in real-world robotics, these methods still struggle, as they require large amounts of expensive interactions and have slow feedback loops. On the other hand, fast human-like adaptive control methods can optimize complex robotic interactions, yet fail to integrate multimodal feedback needed for unstructured tasks. In this work, we propose to factor the learning problem in a hierarchical learning and adaption architecture to get the best of both worlds. The framework consists of two components, a slow reinforcement learning policy optimizing the task strategy given multimodal observations, and a fast, real-time adaptive control policy continuously optimizing the motion, stability, and effort of the manipulator. We combine these components through a bio-inspired action space that we call AFORCE. We demonstrate the new action space on a contact-rich manipulation task on real hardware and evaluate its performance on three simulated manipulation tasks. Our experiments show that AFORCE drastically improves sample efficiency while reducing energy consumption and improving safety.
翻译:智能剂必须能够快速和缓慢地思考复杂的操作任务。强化学习(RL)在一系列具有挑战性的决策任务中带来了许多有希望的结果。然而,在现实世界的机器人中,这些方法仍在挣扎,因为它们需要大量昂贵的互动和缓慢的反馈回路。另一方面,快速的人型适应性控制方法可以优化复杂的机器人互动,但不能将非结构化任务所需的多式联运反馈结合起来。在这项工作中,我们提议将学习问题纳入一个等级化学习和适应结构,以获得两个世界的最佳成果。这个框架由两个部分组成:一个缓慢的强化学习政策,在多式观察下优化任务战略,以及一个快速、实时的适应性控制政策,不断优化操纵者的运动、稳定和努力。我们通过一个生物激励行动空间将这些组成部分结合起来,我们称之为AFORCE。我们展示了在实际硬件上接触丰富的操纵任务上的新行动空间,并评价其三项模拟操纵任务的业绩。我们的实验显示,AFORCE在减少能源消耗并改进安全的同时,大幅提高了抽样效率。