Semantic 3D scene understanding is a problem of critical importance in robotics. While significant advances have been made in simultaneous localization and mapping algorithms, robots are still far from having the common sense knowledge about household objects and their locations of an average human. We introduce a novel method for leveraging common sense embedded within large language models for labelling rooms given the objects contained within. This algorithm has the added benefits of (i) requiring no task-specific pre-training (operating entirely in the zero-shot regime) and (ii) generalizing to arbitrary room and object labels, including previously-unseen ones -- both of which are highly desirable traits in robotic scene understanding algorithms. The proposed algorithm operates on 3D scene graphs produced by modern spatial perception systems, and we hope it will pave the way to more generalizable and scalable high-level 3D scene understanding for robotics.
翻译:3D语义场景理解对于机器人来说是一个至关重要的问题。 虽然在同步定位和绘图算法方面已经取得重大进展,但机器人仍然远远没有掌握关于家用物体及其普通人位置的常识知识。 我们引入了一种新的方法来利用大型语言模型内嵌入的常识,用于贴标签室的标签,因为里面含有物体。 这种算法具有以下附加好处:(一) 不要求具体任务的培训前培训(完全在零发系统内操作),以及(二) 笼统地描述任意的房间和物品标签,包括以前看不见的标签,两者都是机器人场景理解算法中非常可取的特征。 拟议的算法以现代空间感知系统制作的3D景场图运作,我们希望这将为机器人更普遍、更可扩展的高级3D场景理解铺平道路。