General scene understanding for robotics requires flexible semantic representation, so that novel objects and structures which may not have been known at training time can be identified, segmented and grouped. We present an algorithm which fuses general learned features from a standard pre-trained network into a highly efficient 3D geometric neural field representation during real-time SLAM. The fused 3D feature maps inherit the coherence of the neural field's geometry representation. This means that tiny amounts of human labelling interacting at runtime enable objects or even parts of objects to be robustly and accurately segmented in an open set manner.
翻译:了解机器人的一般场景需要灵活的语义表达方式,这样就可以识别、分解和分组在训练时可能不知道的新物体和结构。我们提出一种算法,将标准训练前网络的一般学习特征结合到实时SLM期间高度高效的3D几何神经场代表方式。3D集集成特征图继承了神经场几何代表方式的一致性。这意味着在运行时少量的人体标签互动使得物体或甚至部分物体能够以开放的方式牢固和准确地分割。