Data efficiency in robotic skill acquisition is crucial for operating robots in varied small-batch assembly settings. To operate in such environments, robots must have robust obstacle avoidance and versatile goal conditioning acquired from only a few simple demonstrations. Existing approaches, however, fall short of these requirements. Deep reinforcement learning (RL) enables a robot to learn complex manipulation tasks but is often limited to small task spaces in the real world due to sample inefficiency and safety concerns. Motion planning (MP) can generate collision-free paths in obstructed environments, but cannot solve complex manipulation tasks and requires goal states often specified by a user or object-specific pose estimator. In this work, we propose a system for efficient skill acquisition that leverages an object-centric generative model (OCGM) for versatile goal identification to specify a goal for MP combined with RL to solve complex manipulation tasks in obstructed environments. Specifically, OCGM enables one-shot target object identification and re-identification in new scenes, allowing MP to guide the robot to the target object while avoiding obstacles. This is combined with a skill transition network, which bridges the gap between terminal states of MP and feasible start states of a sample-efficient RL policy. The experiments demonstrate that our OCGM-based one-shot goal identification provides competitive accuracy to other baseline approaches and that our modular framework outperforms competitive baselines, including a state-of-the-art RL algorithm, by a significant margin for complex manipulation tasks in obstructed environments.
翻译:获取机器人技能的数据效率对于在各种小批装配环境中操作机器人至关重要。 要在这种环境中操作机器人,机器人必须具备强有力的避免障碍和从少数简单的演示中获取的多用途目标条件。 但是,现有方法没有达到这些要求。 深强化学习(RL)使机器人能够学习复杂的操作任务,但由于效率和安全关切的抽样,往往局限于现实世界中的小任务空间。 运动规划(MP)可以在受阻碍的环境中产生无碰撞路径,但无法解决复杂的操作任务,并且需要经常由用户或特定对象的表面估计师指定的目标国。 在这项工作中,我们提议了一个高效技能获取系统,利用一个以目标为中心的组合组合模型模型模型(OCGM),用于确定与RL相结合的复杂操作任务,以在障碍环境中解决复杂的操作任务。具体地说,OCMM可以在新的场景中进行一次性的目标对象识别和重新定位,同时让MPMW在避免障碍的情况下引导机器人到目标目标目标目标目标对象。这与技能转换网络相结合,从而弥补了MP的终端状态和可操作性定位的常规操作标准框架之间的缺口。 演示标定标定基准框架,通过测试显示我们具有竞争力的标准基准框架,以其他标准框架,从而显示我们具有竞争性的常规定位的常规定位的进度。</s>