Aiming at facilitating a real-world, ever-evolving and scalable autonomous driving system, we present a large-scale benchmark for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data, which is the first and largest benchmark to date. Existing autonomous driving systems heavily rely on `perfect' visual perception models (e.g., detection) trained using extensive annotated data to ensure the safety. However, it is unrealistic to elaborately label instances of all scenarios and circumstances (e.g., night, extreme weather, cities) when deploying a robust autonomous driving system. Motivated by recent powerful advances of self-supervised and semi-supervised learning, a promising direction is to learn a robust detection model by collaboratively exploiting large-scale unlabeled data and few labeled data. Existing dataset (e.g., KITTI, Waymo) either provides only a small amount of data or covers limited domains with full annotation, hindering the exploration of large-scale pre-trained models. Here, we release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes. We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models. The data and more up-to-date information have been released at https://soda-2d.github.io.
翻译:为促进现实世界、不断演变和可扩展的自主驱动系统,我们提出了一个大型基准,用于通过学习原始数据(这是迄今为止第一个和最大的基准),对各种自我监督的和半监督的方法进行标准化评价,这是迄今为止第一个和最大的基准。现有的自主驱动系统在很大程度上依赖“完美”的视觉认知模型(例如检测),经过培训,使用大量附加说明的数据来确保安全。然而,在部署一个强有力的自主驱动系统时,详细标出所有情景和情况(例如,夜间、极端天气、城市)的事例是不现实的。受最近自我监督的和半监督的不同自我监督方法的强大进步的激励,一个有希望的方向是通过协作利用大规模无标签数据和极少的标签数据来学习一个强健的探测模型。现有的数据集(例如,KITTI、Waymo)要么只提供少量的数据,要么覆盖有限的区域,带有全面的预培训模式,阻碍探索大规模前模型。在这里,我们发布了一个大型天平级天平-2的深度探测模型,一个大型天平天平比级实验室的模型,每20万个城市都有10个标签。