Monocular 3D object detection is a challenging task in the self-driving and computer vision community. As a common practice, most previous works use manually annotated 3D box labels, where the annotating process is expensive. In this paper, we find that the precisely and carefully annotated labels may be unnecessary in monocular 3D detection, which is an interesting and counterintuitive finding. Using rough labels that are randomly disturbed, the detector can achieve very close accuracy compared to the one using the ground-truth labels. We delve into this underlying mechanism and then empirically find that: concerning the label accuracy, the 3D location part in the label is preferred compared to other parts of labels. Motivated by the conclusions above and considering the precise LiDAR 3D measurement, we propose a simple and effective framework, dubbed LiDAR point cloud guided monocular 3D object detection (LPCG). This framework is capable of either reducing the annotation costs or considerably boosting the detection accuracy without introducing extra annotation costs. Specifically, It generates pseudo labels from unlabeled LiDAR point clouds. Thanks to accurate LiDAR 3D measurements in 3D space, such pseudo labels can replace manually annotated labels in the training of monocular 3D detectors, since their 3D location information is precise. LPCG can be applied into any monocular 3D detector to fully use massive unlabeled data in a self-driving system. As a result, in KITTI benchmark, we take the first place on both monocular 3D and BEV (bird's-eye-view) detection with a significant margin. In Waymo benchmark, our method using 10% labeled data achieves comparable accuracy to the baseline detector using 100% labeled data. The codes are released at https://github.com/SPengLiang/LPCG.
翻译:在自我驱动和计算机视觉界中, 3D对象探测是一项具有挑战性的任务。 通常的做法是, 多数先前的作品使用手工加注 3D 框标签, 标记过程昂贵。 在本文中, 我们发现, 精确和仔细加注的标签在单眼3D探测中可能是不必要的, 这是一种有趣的、 反直觉的发现。 使用随机干扰的粗略标签, 探测器可以实现非常接近于使用地心标签的准确性。 我们进入了这个基本机制, 然后在实验中发现: 关于标签准确性, 标签中的 3D 位置部分比其它标签更可取。 根据上述结论和精确的 LiDAR 3D 测量, 我们提出了一个简单而有效的框架, 调用LDAR 点点点引导的透明 3D 对象检测( LPCG) 。 这个框架可以降低未加注解的成本, 或大大提升检测的准确性, 而不会引入额外的说明性成本。 具体地说, 它从未加贴标签的LD 点 3D 点的自我测算, 3D 的自我测算, 3D 数据是精确的 。 自LD 的测算中, 3D,, 使用这样的测算方法, 使用自己的 3D 。