Current multi-view 3D object detection methods often fail to detect objects in the overlap region properly, and the networks' understanding of the scene is often limited to that of a monocular detection network. Moreover, objects in the overlap region are often largely occluded or suffer from deformation due to camera distortion, causing a domain shift. To mitigate this issue, we propose using the following two main modules: (1) Stereo Disparity Estimation for Weak Depth Supervision and (2) Adversarial Overlap Region Discriminator. The former utilizes the traditional stereo disparity estimation method to obtain reliable disparity information from the overlap region. Given the disparity estimates as supervision, we propose regularizing the network to fully utilize the geometric potential of binocular images and improve the overall detection accuracy accordingly. Further, the latter module minimizes the representational gap between non-overlap and overlapping regions. We demonstrate the effectiveness of the proposed method with the nuScenes large-scale multi-view 3D object detection data. Our experiments show that our proposed method outperforms current state-of-the-art models, i.e., DETR3D and BEVDet.
翻译:目前多视图 3D 对象探测方法往往无法正确探测重叠区域内的物体,而网络对现场的了解往往限于单子探测网络,此外,重叠区域内的物体由于摄像扭曲而往往被广泛隐蔽或畸形,造成域变换。为了缓解这一问题,我们提议使用以下两个主要模块:(1) 弱层深度监督的立体差异估计和(2) 反向重叠区域差异探测数据。前者利用传统的立体差异估计方法从重叠区域获取可靠的差异信息。鉴于差异估计作为监督,我们提议将网络正规化,以充分利用望远镜图像的几何潜力,并相应提高总体探测准确度。此外,后一个模块将非重叠和重叠区域之间的代表性差距缩小到最小。我们用大型多视图3D对象探测数据来证明拟议方法的有效性。我们的实验表明,我们拟议的方法比当前设计模型(即,DETR3D和BDEV)的形状要强。