Autonomous robots that interact with their environment require a detailed semantic scene model. For this, volumetric semantic maps are frequently used. The scene understanding can further be improved by including object-level information in the map. In this work, we extend a multi-view 3D semantic mapping system consisting of a network of distributed smart edge sensors with object-level information, to enable downstream tasks that need object-level input. Objects are represented in the map via their 3D mesh model or as an object-centric volumetric sub-map that can model arbitrary object geometry when no detailed 3D model is available. We propose a keypoint-based approach to estimate object poses via PnP and refinement via ICP alignment of the 3D object model with the observed point cloud segments. Object instances are tracked to integrate observations over time and to be robust against temporary occlusions. Our method is evaluated on the public Behave dataset where it shows pose estimation accuracy within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment where multiple chairs and a table are tracked through the scene online, in real time even under high occlusions.
翻译:与环境互动的自主机器人需要详细的语义场景模型。 为此, 经常使用量子语义图。 可以通过将目标级信息纳入地图来进一步改进对场景的理解。 在这项工作中, 我们扩展了一个多视图 3D 语义绘图系统, 由分布式智能边缘传感器网络和对象级信息组成, 以便完成需要目标级投入的下游任务。 物体在地图中通过其 3D 网格模型或作为以对象为中心的体积子图进行代表, 在没有详细 3D 模型时, 可以模拟任意对象的几何学。 我们建议一种基于关键点的方法, 通过 PnP 来估计对象的构成, 并通过比较方案对3D 对象模型与所观测的点云段进行精细化 。 跟踪对象实例, 以便长期综合观测, 并针对暂时的闭塞进行强力 。 我们的方法在公众Behad 数据集上进行了评估, 其中显示在几厘米范围内的估算准确度, 和在现实世界实验网络中, 在一个具有挑战性的实验室环境中, 的多张椅和一张表被实时跟踪通过屏幕,, 甚至在高层闭闭下进行实时 。