We propose an end-to-end solution to address the problem of object localisation in partial scenes, where we aim to estimate the position of an object in an unknown area given only a partial 3D scan of the scene. We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with additional concept nodes from a commonsense knowledge base. Specifically, the nodes of D-SCG represent the scene objects and the edges are their relative positions. Each object node is then connected via different commonsense relationships to a set of concept nodes. With the proposed graph-based scene representation, we estimate the unknown position of the target object using a Graph Neural Network that implements a novel attentional message passing mechanism. The network first predicts the relative positions between the target object and each visible object by learning a rich representation of the objects via aggregating both the object nodes and the concept nodes in D-SCG. These relative positions then are merged to obtain the final position. We evaluate our method using Partial ScanNet, improving the state-of-the-art by 5.9% in terms of the localisation accuracy at a 8x faster training speed.
翻译:我们提出一个端到端解决方案,以解决局部场景中物体定位问题, 我们的目的是通过对场景进行部分 3D 扫描来估计一个未知区域内物体的位置。 我们提出一个新的场景演示, 以便利几何推理, 直接空间共振图(D- SCG), 空间场景图, 该图由普通知识库的额外概念节点丰富而丰富。 具体地说, D- SCG 的节点代表着场景对象, 边缘是它们的相对位置。 每个对象节点随后通过不同的公有感知关系连接到一组概念节点。 我们使用基于图形的场景显示, 利用一个配置新式关注信息传递机制的图形神经网络来估计目标对象的未知位置。 网络首先通过将对象节点和D- SCG 的概念节点加在一起, 来预测目标对象与每个可见对象之间的相对位置的相对位置。 这些相对位置随后合并, 以获得最终位置。 我们使用部分扫描网络来评估我们的方法, 加快地方化的速度, 以5.9 % 的速度 。