We present the first learning-based framework for category-level 3D object detection and implicit shape estimation based on a pair of stereo RGB images in the wild. Traditional stereo 3D object detection approaches describe the detected objects only with 3D bounding boxes and cannot infer their full surface geometry, which makes creating a realistic outdoor immersive experience difficult. In contrast, we propose a new model S-3D-RCNN that can perform precise localization as well as provide a complete and resolution-agnostic shape description for the detected objects. We first decouple the estimation of object coordinate systems from shape reconstruction using a global-local framework. We then propose a new instance-level network that addresses the unseen surface hallucination problem by extracting point-based representations from stereo region-of-interests, and infers implicit shape codes with predicted complete surface geometry. Extensive experiments validate our approach's superior performance using existing and new metrics on the KITTI benchmark. Code and pre-trained models will be available at this https URL.
翻译:我们根据野生立体立体3D物体探测和隐性形状估计的立体立体立体3D物体探测方法,提出了第一个基于学习的3D物体探测和隐性形状估计框架。传统的立体立体立体物体探测方法只用3D捆绑框描述所探测到的物体,无法推断其全部表面几何,这就难以创造出现实的室外浸透体验。相比之下,我们提出了一个新的S-3D-RCNNN模型,可以精确定位,并为被探测到的物体提供完整和分辨率的分辨率-不可辨别形状描述。我们首先利用一个全球-地方框架将物体协调系统的估计与形状重建脱钩。我们然后提出一个新的立体级实例级网络,通过从利益立体区域提取基于点的图像来解决未见的表面幻觉问题,并用预测完整的表面几何形状来推断隐含的形状代码。广泛的实验将利用关于KITTI基准的现有和新的计量标准来验证我们的方法的优劣性表现。我们将在这个 https URL 上提供代码和预先训练过的模型。