Pseudo-LiDAR 3D detectors have made remarkable progress in monocular 3D detection by enhancing the capability of perceiving depth with depth estimation networks, and using LiDAR-based 3D detection architectures. The advanced stereo 3D detectors can also accurately localize 3D objects. The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation. Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods, including image-level generation, feature-level generation, and feature-clone, for detecting 3D objects from a single image. Our analysis of depth-aware learning shows that the depth loss is effective in only feature-level virtual view generation and the estimated depth map is effective in both image-level and feature-level in our framework. We propose a disparity-wise dynamic convolution with dynamic kernels sampled from the disparity feature map to filter the features adaptively from a single image for generating virtual image features, which eases the feature degradation caused by the depth estimation errors. Till submission (November 18, 2021), our Pseudo-Stereo 3D detection framework ranks 1st on car, pedestrian, and cyclist among the monocular 3D detectors with publications on the KITTI-3D benchmark. The code is released at https://github.com/revisitq/Pseudo-Stereo-3D.
翻译:立体探测仪的高级立体立体检测器也可以准确定位3D对象。立体探测器的图像到图像生成差距远小于图像到激光雷达生成的距离。我们为此提议了一个立体探测框架,其中含有三种新颖的虚拟生成方法,包括图像级生成、地平级生成和地格生成和地格生成,用于从单一图像中探测3D对象。我们对深觉学习的高级立体检测器也能够准确地定位3D对象。先进的立体立体立体检测器还可以准确定位3D对象。立体图像到图像生成的距离比立体图像到立方码生成的距离要小得多。我们为此提议了一种差异性动态共变相,从差异性特征图中抽取了动态内核,从生成虚拟图像的单一图像中过滤了特征,从而缓解了在深度评估过程中产生的地貌退化。 至此,在深度检测/代地标框架(11月18日、2021日)上,在纸路标3S级上,在深度检测/级上展示。