MT-Opt: 大规模持续多任务机器人加强学习 (MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale)

General-purpose robotic systems must master a large repertoire of diverse skills to be useful in a range of daily tasks. While reinforcement learning provides a powerful framework for acquiring individual behaviors, the time needed to acquire each skill makes the prospect of a generalist robot trained with RL daunting. In this paper, we study how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously, sharing exploration, experience, and representations across tasks. In this framework new tasks can be continuously instantiated from previously learned tasks improving overall performance and capabilities of the system. To instantiate this system, we develop a scalable and intuitive framework for specifying new tasks through user-provided examples of desired outcomes, devise a multi-robot collective learning system for data collection that simultaneously collects experience for multiple tasks, and develop a scalable and generalizable multi-task deep reinforcement learning method, which we call MT-Opt. We demonstrate how MT-Opt can learn a wide range of skills, including semantic picking (i.e., picking an object from a particular category), placing into various fixtures (e.g., placing a food item onto a plate), covering, aligning, and rearranging. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots, and demonstrate the performance of our system both in terms of its ability to generalize to structurally similar new tasks, and acquire distinct new tasks more quickly by leveraging past experience. We recommend viewing the videos at https://karolhausman.github.io/mt-opt/

翻译：通用机器人系统必须掌握大量的多种技能,才能在一系列日常任务中发挥作用。强化学习为获得个人行为提供了一个强大的框架,而获得每种技能所需的时间则使得拥有一个接受RL巨大培训的通用机器人的前景。在本文中,我们研究大规模集体机器人学习系统如何同时获得一套行为,共享探索、经验和跨任务表达方式。在这个框架中,新任务可以不断从以往学到的任务中从改进系统的整体性能和能力方面汲取大量回馈。为了即时化这个系统,我们开发了一个可缩放和直观的框架,通过用户提供的预期结果范例来具体确定新的任务,设计一个多机器人集体学习系统,同时收集多项任务的经验,并开发一个可缩放和可推广的多功能深度强化学习方法,我们称之为MT-Opt。我们演示MT-Opt如何从以前学到广泛的技能,包括Semmankarcick(即从某个特定类别中选取一个对象),从一个特定类别中选取新的任务,将一个具有相似性能的图像,放在各种固定系统上,将我们收集到一个更精确的、更精确的系统上、更精确的、更精确的系统,将一个我们所收集的、更精确的、更精确的、更精确的、更精确的系统上、更精确的、更精确的系统上、更精确的、更精确的系统上、更精确的系统上、更精确的系统上的数据。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

专知会员服务

110+阅读 · 2022年3月2日

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日