Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN). Based on these frame-to-model associations, our back-end optimizes object bounding volumes, represented as super-quadrics, under multi-view geometry constraints and the object scale prior. We validate the proposed system on ScanNet where we show a significant improvement over existing RGB-only methods.
翻译:在3D中定位物体和估计其范围是向高级 3D 场景了解迈出的重要一步,这种了解在增强现实和机器人方面有许多应用。我们展示了3D天体探测、联系和绘图系统,即3D天体探测、联系和绘图系统,使用制成的 RGB 视频。拟议系统依靠一个深层学习的前端,从一个特定的 RGB 框架中探测3D天体,并使用一个图形神经网络将其与一个全球天体图联系起来。基于这些框架对模型的关联,我们的后端优化天体捆绑体,以超二次方体、多视几何限制和天标比例为基础,我们验证了ScanNet上的拟议系统,我们在该系统中显示比现有的 RGB 专用方法有了重大改进。