General-purpose robotic systems must master a large repertoire of diverse skills to be useful in a range of daily tasks. While reinforcement learning provides a powerful framework for acquiring individual behaviors, the time needed to acquire each skill makes the prospect of a generalist robot trained with RL daunting. In this paper, we study how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously, sharing exploration, experience, and representations across tasks. In this framework new tasks can be continuously instantiated from previously learned tasks improving overall performance and capabilities of the system. To instantiate this system, we develop a scalable and intuitive framework for specifying new tasks through user-provided examples of desired outcomes, devise a multi-robot collective learning system for data collection that simultaneously collects experience for multiple tasks, and develop a scalable and generalizable multi-task deep reinforcement learning method, which we call MT-Opt. We demonstrate how MT-Opt can learn a wide range of skills, including semantic picking (i.e., picking an object from a particular category), placing into various fixtures (e.g., placing a food item onto a plate), covering, aligning, and rearranging. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots, and demonstrate the performance of our system both in terms of its ability to generalize to structurally similar new tasks, and acquire distinct new tasks more quickly by leveraging past experience. We recommend viewing the videos at https://karolhausman.github.io/mt-opt/
翻译:通用机器人系统必须掌握大量的多种技能,才能在一系列日常任务中发挥作用。强化学习为获得个人行为提供了一个强大的框架,而获得每种技能所需的时间则使得拥有一个接受RL巨大培训的通用机器人的前景。在本文中,我们研究大规模集体机器人学习系统如何同时获得一套行为,共享探索、经验和跨任务表达方式。在这个框架中,新任务可以不断从以往学到的任务中从改进系统的整体性能和能力方面汲取大量回馈。为了即时化这个系统,我们开发了一个可缩放和直观的框架,通过用户提供的预期结果范例来具体确定新的任务,设计一个多机器人集体学习系统,同时收集多项任务的经验,并开发一个可缩放和可推广的多功能深度强化学习方法,我们称之为MT-Opt。我们演示MT-Opt如何从以前学到广泛的技能,包括Semmankarcick(即从某个特定类别中选取一个对象),从一个特定类别中选取新的任务,将一个具有相似性能的图像,放在各种固定系统上,将我们收集到一个更精确的、更精确的系统上、更精确的、更精确的系统,将一个我们所收集的、更精确的、更精确的、更精确的、更精确的系统上、更精确的、更精确的系统上、更精确的、更精确的系统上、更精确的系统上、更精确的系统上的数据。