Autonomous driving relies on a huge volume of real-world data to be labeled to high precision. Alternative solutions seek to exploit driving simulators that can generate large amounts of labeled data with a plethora of content variations. However, the domain gap between the synthetic and real data remains, raising the following important question: What are the best ways to utilize a self-driving simulator for perception tasks? In this work, we build on top of recent advances in domain-adaptation theory, and from this perspective, propose ways to minimize the reality gap. We primarily focus on the use of labels in the synthetic domain alone. Our approach introduces both a principled way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. Our method is easy to implement in practice as it is agnostic of the network architecture and the choice of the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data (cameras, lidar) using an open-source simulator (CARLA), and evaluate the entire framework on a real-world dataset (nuScenes). Last but not least, we show what types of variations (e.g. weather conditions, number of assets, map design, and color diversity) matter to perception networks when trained with driving simulators, and which ones can be compensated for with our domain adaptation technique.
翻译:自主驱动取决于大量真实世界数据,需要贴上高精度的标签。替代解决方案寻求利用驱动模拟器,这些模拟器可以产生大量标签数据,内容差异过多。然而,合成数据与真实数据之间的领域差距仍然存在,从而提出了以下重要问题:利用自我驱动模拟器模拟感知任务的最佳方法是什么?在这项工作中,除了最近领域调适理论的进展之外,我们还从这个角度出发,提出最大限度地缩小现实差距的方法。我们主要侧重于仅在合成域使用标签。我们的方法引入了一种原则性方法,以学习大量带有大量内容差异的标签数据。在模拟器中,合成数据和真实数据样本之间如何取样。我们的方法在实践上很容易实施,因为它是网络结构和模拟器的选择。我们用多种感官数据(摄像器、岩浆)来展示我们关于鸟类-视视图车辆分解任务的方法,我们使用公开源代码的域域域图(CARLA)来学习内变异表,并且用我们经过训练的颜色模型来评估整个框架,而不是我们真实的变异性数据(CLA)来评估真实的变数。