The detection of 3D objects through a single perspective camera is a challenging issue. The anchor-free and keypoint-based models receive increasing attention recently due to their effectiveness and simplicity. However, most of these methods are vulnerable to occluded and truncated objects. In this paper, a single-stage monocular 3D object detection model is proposed. An instance-segmentation head is integrated into the model training, which allows the model to be aware of the visible shape of a target object. The detection largely avoids interference from irrelevant regions surrounding the target objects. In addition, we also reveal that the popular IoU-based evaluation metrics, which were originally designed for evaluating stereo or LiDAR-based detection methods, are insensitive to the improvement of monocular 3D object detection algorithms. A novel evaluation metric, namely average depth similarity (ADS) is proposed for the monocular 3D object detection models. Our method outperforms the baseline on both the popular and the proposed evaluation metrics while maintaining real-time efficiency.
翻译:通过单一视角摄像头探测三维对象是一个具有挑战性的问题。无锚和关键点模型最近因其有效性和简便性而日益受到越来越多的关注。然而,这些方法大多容易被隐蔽和截断的物体所利用。本文提出了单阶段单级单立体3D物体探测模型。将一个例分层头纳入模型培训,使模型能够了解目标物体的可见形状。这种检测在很大程度上避免了目标物体周围无关区域的干扰。此外,我们还发现,基于IoU的流行评价指标最初设计用于评价立体或立体雷达探测方法,对改进单立体3D物体探测算法不敏感。为单立体3D物体探测模型提出了一个新的评价指标,即平均深度相似性(ADS)。我们的方法在保持实时效率的同时,超越了普通和拟议评价指标的基线。