OPA-3D: 单立体3D物体探测的封闭 - 软件像素- Wise 聚合 (OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection)

Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation, we instead propose OPA-3D, a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network that to jointly estimate dense scene depth with depth-bounding box residuals and object bounding boxes, allowing a two-stream detection of 3D objects, leading to significantly more robust detections. Thereby, the geometry stream denoted as the Geometry Stream, combines visible depth and depth-bounding box residuals to recover the object bounding box via explicit occlusion-aware optimization. In addition, a bounding box based geometry projection scheme is employed in an effort to enhance distance perception. The second stream, named as the Context Stream, directly regresses 3D object location and size. This novel two-stream representation further enables us to enforce cross-stream consistency terms which aligns the outputs of both streams, improving the overall performance. Extensive experiments on the public benchmark demonstrate that OPA-3D outperforms state-of-the-art methods on the main Car category, whilst keeping a real-time inference speed. We plan to release all codes and trained models soon.

翻译：尽管由于使用经过预先训练的深度测深器进行伪激光雷达回收,这些两阶段方法最近取得了显著的飞跃,但由于使用经过预先训练的深度测深器,这些两阶段方法通常受到过度改造,无法明确包罗深度和物体捆绑框之间的几何关系。为了克服这一限制,我们提议采用OPA-3D这一单阶段、端到端、封闭-软件像素分解聚合网,以便利用深度测深箱残余物和物体捆绑盒共同估计密集的场景深度,从而能够对3D对象进行双向探测,从而导致大大增强探测速度。因此,以Georoit Stream为标志的几级测深流将可见深度和深度框框残余物结合起来,以便通过清晰的封闭式测深优化来恢复物体捆绑定框。此外,采用一个基于捆绑的框的几何测预测仪,以提高距离感知知深度。第二个流,称为“背景”、直接反射3D对象位置和大小,从而大大地探测3D对象,从而大大地探测。因此,新的两层定流流系系系系系系系将整个轨道的实验数据比标,进一步显示整个轨道的进度。