通过自我监督的几何建模进行端至端自主驾驶的政策预培训 (Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling)

Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.

翻译：目睹计算机视觉和自然语言处理领域大规模数据培训前技术的令人印象深刻的成就,我们想知道这一想法是否可以以权宜之计调整,并减轻用于牵引机驾驶的政策前训练的抽样低效率问题。鉴于投入的高度动态性和变异性性质,粘浮机驾驶任务本身缺乏视图和翻译,视觉输入含有大量与决策无关的信息,导致从一般视野到不适于自主驾驶任务的培训前方法的主导性做法。为此,我们提议改进PPPGGeo(通过测地模型进行政策前训练),这是一个直观和直接的自我监督框架,为平板驾驶政策前训练而设计一个直观和直接的自我监督框架。我们的目标是通过在大规模未标注和未校准的YouTube驱动视频上模拟3D的几度场场景来学习政策表述,从而作为强大的抽象。拟议的PPGGioo将分两个阶段进行,以支持有效的自我监督培训。在第一阶段,几何建模型框架将同时进行面和深度的预测,在两个直观-直观-直观-直观-直观-直观的显示-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直图-直观-直观-直观-直观-直观-直观-直图-直观-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直图-直

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR2022】自动驾驶中的伪双目三维目标检测，Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving

专知会员服务

18+阅读 · 2022年3月19日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日