Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
翻译:目睹计算机视觉和自然语言处理领域大规模数据培训前技术的令人印象深刻的成就,我们想知道这一想法是否可以以权宜之计调整,并减轻用于牵引机驾驶的政策前训练的抽样低效率问题。鉴于投入的高度动态性和变异性性质,粘浮机驾驶任务本身缺乏视图和翻译,视觉输入含有大量与决策无关的信息,导致从一般视野到不适于自主驾驶任务的培训前方法的主导性做法。为此,我们提议改进PPPGGeo(通过测地模型进行政策前训练),这是一个直观和直接的自我监督框架,为平板驾驶政策前训练而设计一个直观和直接的自我监督框架。我们的目标是通过在大规模未标注和未校准的YouTube驱动视频上模拟3D的几度场场景来学习政策表述,从而作为强大的抽象。拟议的PPGGioo将分两个阶段进行,以支持有效的自我监督培训。在第一阶段,几何建模型框架将同时进行面和深度的预测,在两个直观-直观-直观-直观-直观-直观的显示-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直图-直观-直观-直观-直观-直观-直观-直图-直观-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直