We present the first learning-based framework for category-level 3D object detection and implicit shape estimation based on a pair of stereo RGB images in the wild. Previous stereo 3D object detection approaches cannot describe the complete shape details of the detected objects and often fails for the small objects. In contrast, we propose a new progressive approach that can (1) perform precise localization as well as provide a complete and resolution-agnostic shape description for the detected objects and (2) produce significantly more accurate orientation predictions for the tiny instances. This approach features a new instance-level network that explicitly models the unseen surface hallucination problem using point-based representations and uses a new geometric representation for orientation refinement. Extensive experiments show that our approach achieves state-of-the-art performance using various metrics on the KITTI benchmark. Code and pre-trained models will be available at this https URL.
翻译:我们根据野生立体立体立体物体的一对立体立体立体图像,提出了第一个基于学习的3D对象探测和隐性形状估计框架。以前的立体立体物体探测方法无法描述所发现物体的完整形状细节,而且对于小物体往往失败。相反,我们提出了一个新的渐进方法,该方法可以:(1) 精确定位,并为所探测到的物体提供完整和分辨率的识别形状描述,(2) 为这些微小的物体提供更准确的定向预测。这个方法的特点是一个新的实例级网络,它利用基于点的表示法明确模拟无形表面幻觉问题,并使用新的几何表示法进行定向改进。广泛的实验表明,我们的方法能够利用KITTI基准的各种指标实现最新业绩。本https URL将提供守则和经过预先训练的模型。