In this paper, we improve the challenging monocular 3D object detection problem with a general semi-supervised framework. Specifically, having observed that the bottleneck of this task lies in lacking reliable and informative samples to train the detector, we introduce a novel, simple, yet effective `Augment and Criticize' framework that explores abundant informative samples from unlabeled data for learning more robust detection models. In the `Augment' stage, we present the Augmentation-based Prediction aGgregation (APG), which aggregates detections from various automatically learned augmented views to improve the robustness of pseudo label generation. Since not all pseudo labels from APG are beneficially informative, the subsequent `Criticize' phase is presented. In particular, we introduce the Critical Retraining Strategy (CRS) that, unlike simply filtering pseudo labels using a fixed threshold (e.g., classification score) as in 2D semi-supervised tasks, leverages a learnable network to evaluate the contribution of unlabeled images at different training timestamps. This way, the noisy samples prohibitive to model evolution could be effectively suppressed. To validate our framework, we apply it to MonoDLE and MonoFlex. The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI, showing its effectiveness and generality. Code and models will be released.
翻译:Translated Abstract:
本文中,我们提出了一个通用的半监督框架,改善了具有挑战性的单目3D目标检测问题。具体而言,我们观察到该任务的瓶颈在于缺乏可靠且信息丰富的样本来训练检测器,因此,我们引入了一个新颖的、简单而有效的“增广和批判”框架,从未标记的数据中探索丰富的有用样本,以学习更加鲁棒的检测模型。在“增广”阶段,我们提出了一种名为基于增广预测聚合(APG)的检测聚合方法,该方法从不同的自动学习的增强视角中聚合检测结果,以提高伪标签生成的鲁棒性。由于来自APG的所有伪标签都不一定有益,接下来的“批判”阶段被引入。特别地,我们提出了一个名为关键再训练策略(CRS)的模型,不像2D半监督任务中简单地使用固定阈值(例如分类分数)来过滤伪标签,而是利用一个可学习的网络来评估不同训练时间戳下未标记图像的贡献。通过这种方式,禁止模型发展的嘈杂样本可以被有效地抑制。为了验证我们的框架,我们将其应用到MonoDLE和MonoFlex上。两个新的检测器,分别称为3DSeMo_DLE和3DSeMo_FLEX,在KITTI上取得了最先进的结果,并针对超过3.5% AP_3D/BEV(Easy)的检测性能实现了显着的改进,显示了其有效性和普适性。我们将发布代码和模型。