This paper focuses on monocular 3D object detection, one of the essential modules in autonomous driving systems. A key challenge is that the depth recovery problem is ill-posed in monocular data. In this work, we first conduct a thorough analysis to reveal how existing methods fail to robustly estimate depth when different geometry shifts occur. In particular, through a series of image-based and instance-based manipulations for current detectors, we illustrate existing detectors are vulnerable in capturing the consistent relationships between depth and both object apparent sizes and positions. To alleviate this issue and improve the robustness of detectors, we convert the aforementioned manipulations into four corresponding 3D-aware data augmentation techniques. At the image-level, we randomly manipulate the camera system, including its focal length, receptive field and location, to generate new training images with geometric shifts. At the instance level, we crop the foreground objects and randomly paste them to other scenes to generate new training instances. All the proposed augmentation techniques share the virtue that geometry relationships in objects are preserved while their geometry is manipulated. In light of the proposed data augmentation methods, not only the instability of depth recovery is effectively alleviated, but also the final 3D detection performance is significantly improved. This leads to superior improvements on the KITTI and nuScenes monocular 3D detection benchmarks with state-of-the-art results.
翻译:本文侧重于单眼 3D 对象探测,这是自主驱动系统的基本模块之一。 关键的挑战之一是深度恢复问题在单眼数据中存在错误。 在这项工作中, 我们首先进行彻底分析, 以揭示当不同几何变化发生时, 现有方法如何无法精确估计深度。 特别是, 通过一系列图像和实例操作, 对当前探测器进行图像和实例操作, 我们说明现有探测器在捕捉深度与对象表面大小和位置之间的一致关系时很脆弱。 为了缓解这一问题并提高探测器的坚固性, 我们将上述操作转换为四种对应的 3D 数据增强技术。 在图像层面, 我们随机操作相机系统, 包括它的焦点长度、 接受场和位置, 以生成具有几何变化的新的培训图像。 在实例层面, 我们为地表对象植入一系列图像, 随机将其粘贴到其他场景, 以产生新的培训实例。 所有拟议增强技术都具有这样的优点: 在对物体进行几何测量时保存几何关系。 根据拟议的数据增强方法, 我们不仅将深度探测结果的不稳定性提高到3级标准, 。