A core challenge for an autonomous agent acting in the real world is to adapt its repertoire of skills to cope with its noisy perception and dynamics. To scale learning of skills to long-horizon tasks, robots should be able to learn and later refine their skills in a structured manner through trajectories rather than making instantaneous decisions individually at each time step. To this end, we propose the Soft Actor-Critic Gaussian Mixture Model (SAC-GMM), a novel hybrid approach that learns robot skills through a dynamical system and adapts the learned skills in their own trajectory distribution space through interactions with the environment. Our approach combines classical robotics techniques of learning from demonstration with the deep reinforcement learning framework and exploits their complementary nature. We show that our method utilizes sensors solely available during the execution of preliminarily learned skills to extract relevant features that lead to faster skill refinement. Extensive evaluations in both simulation and real-world environments demonstrate the effectiveness of our method in refining robot skills by leveraging physical interactions, high-dimensional sensory data, and sparse task completion rewards. Videos, code, and pre-trained models are available at http://sac-gmm.cs.uni-freiburg.de.
翻译:对于在现实世界中行事的自主代理人来说,一个核心挑战是如何调整其技能的集合,以应对其噪音感知和动态动态。为了将技能的学习与长期风速任务相适应,机器人应当能够通过轨迹学习,而不是在每一个步骤中单独作出即时决定,以结构化的方式学习和随后完善其技能。为此,我们提议了Soft Acor-Critic Gaussian Mixtur模型(SAC-GIMM),这是一种新型混合方法,通过动态系统学习机器人技能,并通过与环境互动调整自身轨道分布空间的学习技能。我们的方法将经典机器人技术与深强化学习框架相结合,并利用其互补性质。我们表明,我们的方法仅利用初步学习的技能来获取相关特征,从而更快地改进技能。在模拟和现实世界环境中进行广泛的评价,表明我们通过利用物理互动、高维度感应数据和稀少的任务完成奖励来精炼机器人技能的方法的有效性。我们的方法结合了古典机器人从演示中学习的技术与深层强化学习框架,并利用其互补性质。我们展示的方法,我们的方法仅利用了传感器,我们的方法,我们的方法是利用在应用的传感器、代码和预设的模型。在http://accreburges.