We present a new learning-based framework S-3D-RCNN that can recover accurate object orientation in SO(3) and simultaneously predict implicit shapes for outdoor rigid objects from stereo RGB images. In contrast to previous studies that map local appearance to observation angles, we explore a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs) to estimate egocentric object orientation. This approach features a deep model that transforms perceived intensities to object part coordinates, which are mapped to a 3D representation encoding object orientation in the camera coordinate system. To enable implicit shape estimation, the IGRs are further extended to model visible object surface with a point-based representation and explicitly addresses the unseen surface hallucination problem. Extensive experiments validate the effectiveness of the proposed IGRs and S-3D-RCNN achieves superior 3D scene understanding performance using existing and proposed new metrics on the KITTI benchmark. Code and pre-trained models will be available at this https URL.
翻译:我们提出了一个基于学习的新S-3D-RCNN框架,该框架可以在SO(3)中恢复准确的物体定位,同时从立体 RGB 图像中预测室外僵硬物体的隐含形状。与以前对观测角度进行的地方外观映射研究相比,我们探索了一种渐进的方法,通过提取有意义的中间几何表示法(IGRs)来估计以自我为中心的物体定向。这个方法具有一种深层次的模式,将人们所感觉到的强度转化为目标坐标,该模型被映射到相机坐标系统中的3D代表编码对象定向。为了能够进行隐含的形状估计,IGRs将进一步扩展至以基于点的表示法模拟可见物体表面,并明确处理看不见的表面幻觉问题。广泛的实验验证了拟议的IGRs和S-3D-RCNNN(S)的有效性,利用KITTI基准的现有和拟议的新指标,使高级三维场景了解业绩。这个https URL将提供代码和预先训练的模型。