Robot learning holds the promise of learning policies that generalize broadly. However, such generalization requires sufficiently diverse datasets of the task of interest, which can be prohibitively expensive to collect. In other fields, such as computer vision, it is common to utilize shared, reusable datasets, such as ImageNet, to overcome this challenge, but this has proven difficult in robotics. In this paper, we ask: what would it take to enable practical data reuse in robotics for end-to-end skill learning? We hypothesize that the key is to use datasets with multiple tasks and multiple domains, such that a new user that wants to train their robot to perform a new task in a new domain can include this dataset in their training process and benefit from cross-task and cross-domain generalization. To evaluate this hypothesis, we collect a large multi-domain and multi-task dataset, with 7,200 demonstrations constituting 71 tasks across 10 environments, and empirically study how this data can improve the learning of new tasks in new environments. We find that jointly training with the proposed dataset and 50 demonstrations of a never-before-seen task in a new domain on average leads to a 2x improvement in success rate compared to using target domain data alone. We also find that data for only a few tasks in a new domain can bridge the domain gap and make it possible for a robot to perform a variety of prior tasks that were only seen in other domains. These results suggest that reusing diverse multi-task and multi-domain datasets, including our open-source dataset, may pave the way for broader robot generalization, eliminating the need to re-collect data for each new robot learning project.
翻译:机器人学习具有广泛推广学习政策的希望。 但是, 这样的概括化需要足够多样的感兴趣任务数据集, 收集的费用可能非常昂贵。 在计算机愿景等其他领域, 通常使用共享的、 可重复使用的数据集, 如图像网络, 来克服这一挑战, 但在机器人中这证明很困难 。 在本文中, 我们问: 要让机器人在端到端技能学习中使用实用的数据再利用数据, 需要怎样才能让机器人在端到端技能学习? 我们假设关键在于如何用多种任务和多个域来铺路, 这样, 想要训练机器人在新域执行新任务的新任务的新用户可以把数据集纳入培训过程, 并且从交叉任务和交叉任务中获取好处。 为了评估这个假设, 我们收集了一个大型多域和多任务, 7,200个演示在10个环境里共71个任务中, 以及实验性研究这些数据如何改进新任务在新环境中的公开任务。 我们发现, 与拟议的数据数据集和50个机器人在新任务中执行新任务的新任务, 在新领域里, 我们只能用新目标域域里 学习新任务, 学习新任务, 唯一的域里域里, 我们只需要在新任务里域里域里域里要完成一个新任务, 学习一个新任务, 我们只能在新任务, 在新任务中找到一个新任务, 在新任务里域里域里域里, 学习一个新任务里, 学习一个新任务里, 学习一个新任务, 学习一个新任务, 学习一个新任务里域里域里域里域里域里, 可能找到一个新任务, 在新任务, 学习一个新任务, 在新任务里域里域里域里域里行到新任务中, 学习一个新任务, 可能重新一个新任务里域里域里域里域里域里域里域里, 学习一个新任务, 可能找到一个新任务, 学习一个新任务, 学习一个新任务里行到新任务里行到新任务, 我们只能要在新任务, 可能找到一个新任务里行到新任务, 学习一个新任务, 可能找到一个新任务, 可能找到一个新任务里域里域里域里域里域里域里域里域里行到新任务。