Effectively utilizing the vast amounts of ego-centric navigation data that is freely available on the internet can advance generalized intelligent systems, i.e., to robustly scale across perspectives, platforms, environmental conditions, scenarios, and geographical locations. However, it is difficult to directly leverage such large amounts of unlabeled and highly diverse data for complex 3D reasoning and planning tasks. Consequently, researchers have primarily focused on its use for various auxiliary pixel- and image-level computer vision tasks that do not consider an ultimate navigational objective. In this work, we introduce SelfD, a framework for learning scalable driving by utilizing large amounts of online monocular images. Our key idea is to leverage iterative semi-supervised training when learning imitative agents from unlabeled data. To handle unconstrained viewpoints, scenes, and camera parameters, we train an image-based model that directly learns to plan in the Bird's Eye View (BEV) space. Next, we use unlabeled data to augment the decision-making knowledge and robustness of an initially trained model via self-training. In particular, we propose a pseudo-labeling step which enables making full use of highly diverse demonstration data through "hypothetical" planning-based data augmentation. We employ a large dataset of publicly available YouTube videos to train SelfD and comprehensively analyze its generalization benefits across challenging navigation scenarios. Without requiring any additional data collection or annotation efforts, SelfD demonstrates consistent improvements (by up to 24%) in driving performance evaluation on nuScenes, Argoverse, Waymo, and CARLA.
翻译:有效地利用互联网上免费提供的以自我为中心的大量导航数据,可以推进通用智能系统,也就是说,在各种视角、平台、环境条件、情景和地理位置之间大力推广。然而,很难直接利用大量无标签和高度多样化的数据进行复杂的三维推理和规划任务。因此,研究人员主要侧重于将这些数据用于各种辅助像素和图像级的计算机愿景任务,而这些任务并不考虑最终导航目标。在这项工作中,我们引入了SelfD,这是一个利用大量在线单子图像进行可升级驱动学习的框架。我们的关键想法是利用迭接的半监督培训,从未贴标签的数据中学习模仿剂。要处理不受限制和高度多样化的视角、场景和相机参数,我们培训了一个基于图像的模型,直接学习在鸟眼观(BEEV)空间进行规划。接下来,我们使用无标签数据来增加决策知识和初步培训模型的稳健性。我们特别建议用一个假称D级标签步骤来利用迭代的半监督性培训培训培训。我们建议了一个不连续的自我驱动的升级的阶段,以便全面使用具有高度多样性的自我导航数据的数据收集。