A core challenge for an autonomous agent acting in the real world is to adapt its repertoire of skills to cope with its noisy perception and dynamics. To scale learning of skills to long-horizon tasks, robots should be able to learn and later refine their skills in a structured manner through trajectories rather than making instantaneous decisions individually at each time step. To this end, we propose the Soft Actor-Critic Gaussian Mixture Model (SAC-GMM), a novel hybrid approach that learns robot skills through a dynamical system and adapts the learned skills in their own trajectory distribution space through interactions with the environment. Our approach combines classical robotics techniques of learning from demonstration with the deep reinforcement learning framework and exploits their complementary nature. We show that our method utilizes sensors solely available during the execution of preliminarily learned skills to extract relevant features that lead to faster skill refinement. Extensive evaluations in both simulation and real-world environments demonstrate the effectiveness of our method in refining robot skills by leveraging physical interactions, high-dimensional sensory data, and sparse task completion rewards. Videos, code, and pre-trained models are available at \url{http://sac-gmm.cs.uni-freiburg.de}.
翻译:一个在现实世界中行事的自主代理机构的核心挑战是如何调整其技能库以适应其噪音感知和动态。为了将技能的学习范围扩大到长期风速任务,机器人应当能够通过轨迹学习并随后通过结构化的方式,而不是在每一个步骤中单独作出即时决定来学习和完善其技能。为此,我们提议了Soft Acor-Critic Gaussian Mixtur 模型(SAC-GMMM),这是一种新型的混合方法,通过动态系统学习机器人技能,并通过与环境互动来调整自身轨道分布空间的学习技能。我们的方法将典型的从演示中学习的机器人技术与深强化学习框架结合起来,并利用其互补性质。我们表明,我们的方法仅利用初步学习的技能来获取相关特征,从而更快地改进技能。在模拟和现实世界环境中进行广泛的评价,表明我们通过利用物理互动、高维感应数据和稀薄的任务完成奖励来精炼机器人技能的方法的有效性。我们的方法将经典的机器人技术从演示技术与深层强化学习技术的学习技巧结合起来,并利用其互补性质。我们的方法,我们展示了我们的方法,我们的方法,我们的方法在应用中,我们的方法在应用初步的技巧中可以用来用来用来去获得使技能改进。在进行。在应用中可以/调制成制模调制成制成制模和制模/制成模。