Deep visuomotor policy learning, which aims to map raw visual observation to action, achieves promising results in control tasks such as robotic manipulation and autonomous driving. However, it requires a huge number of online interactions with the training environment, which limits its real-world application. Compared to the popular unsupervised feature learning for visual recognition, feature pretraining for visuomotor control tasks is much less explored. In this work, we aim to pretrain policy representations for driving tasks by watching hours-long uncurated YouTube videos. Specifically, we train an inverse dynamic model with a small amount of labeled data and use it to predict action labels for all the YouTube video frames. A new contrastive policy pretraining method is then developed to learn action-conditioned features from the video frames with pseudo action labels. Experiments show that the resulting action-conditioned features obtain substantial improvements for the downstream reinforcement learning and imitation learning tasks, outperforming the weights pretrained from previous unsupervised learning methods and ImageNet pretrained weight. Code, model weights, and data are available at: https://metadriverse.github.io/ACO.
翻译:深潜运动政策学习旨在绘制原始视觉观察结果,从而在机器人操纵和自主驾驶等控制任务中取得有希望的成果。然而,它需要大量与培训环境进行在线互动,这限制了培训环境的实际应用。与普通的未经监督的视觉识别特征学习相比,对生动控制任务的特质培训远没有那么深入探讨。在这项工作中,我们的目标是通过观看未完成的YouTube视频,为驾驶任务预设政策说明。具体地说,我们用少量标签数据来培训反动态模型,并用它来预测所有YouTube视频框架的行动标签。然后开发了新的对比性政策预培训方法,从视频框架中学习带有行动标志的具有行动条件的特征。实验表明,由此产生的具有行动条件的特征大大改进了下游强化学习和仿造学习任务,超过了先前未完成的学习方法和图像网络预先训练的重量。代码、模型重量和数据见:https://metadriverse.github.io/CO。