We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask boundaries using geometric and motion information. Meanwhile, we estimate the pose of each existing moving object using an object-oriented tracking method and robustly track the camera pose against the static scene. Based on the estimated camera pose and object poses, we associate segmented masks with existing models and incrementally fuse corresponding colour, depth, semantic, and foreground object probabilities into each object model. In contrast to existing approaches, our system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks. Our method can run at 2-3 Hz on a CPU, excluding the instance segmentation part. We demonstrate its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.
翻译:我们提出一个新的多因子动态 RGB-D SLAM 系统, 使用一个基于物体水平的 octree 体积表示法, 新的多因子动态 RGB- D SLAM 系统。 它可以在动态环境中提供强力的相机跟踪, 同时, 持续估计现场任意物体的几何、 语义和运动特性。 对于每个输入的框架, 我们使用几何和运动信息, 进行实例分割, 以探测物体, 并改进掩体界限。 同时, 我们用一个面向物体的跟踪方法来估计每个现有移动物体的构成, 并强有力地跟踪相机对静态场景的构成。 根据估计的相机的构成和构成, 我们把隔断面罩与现有的模型联系起来, 并逐步将相应的颜色、 深度、 语义和 前景对象概率结合到每个对象模型中。 与现有的方法不同, 我们的系统是第一个系统, 来从一个单一的 RGB- D 相机产生一个物体水平的动态体积图, 可以直接用于机器人任务。 我们的方法可以在 CPU 2-3 Hz 运行, 除实例分割部分。 我们的方法可以通过定量和定性测试来显示其有效性。 我们通过合成和 。