3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart. However, it is also more challenging due to the higher complexity and wider variety of inter-object relations contained in point clouds. Existing methods only treat such relations as by-products of object feature learning in graphs without specifically encoding them, which leads to sub-optimal results. In this paper, aiming at improving 3D dense captioning via capturing and utilizing the complex relations in the 3D scene, we propose MORE, a Multi-Order RElation mining model, to support generating more descriptive and comprehensive captions. Technically, our MORE encodes object relations in a progressive manner since complex relations can be deduced from a limited number of basic ones. We first devise a novel Spatial Layout Graph Convolution (SLGC), which semantically encodes several first-order relations as edges of a graph constructed over 3D object proposals. Next, from the resulting graph, we further extract multiple triplets which encapsulate basic first-order relations as the basic unit, and construct several Object-centric Triplet Attention Graphs (OTAG) to infer multi-order relations for every target object. The updated node features from OTAG are aggregated and fed into the caption decoder to provide abundant relational cues, so that captions including diverse relations with context objects can be generated. Extensive experiments on the Scan2Cap dataset prove the effectiveness of our proposed MORE and its components, and we also outperform the current state-of-the-art method. Our code is available at https://github.com/SxJyJay/MORE.
翻译:3D 密度高的字幕是最近提出的一个新任务, 点云含有比 2D 对应方更多的几何信息。 然而, 点云也更具挑战性, 因为点云所含的点际关系更加复杂, 范围更广。 现有方法只将目标特性的副产品处理成图表中学习, 而没有具体编码它们, 从而导致亚优结果。 本文旨在通过捕捉和利用三D 场中复杂的关系来改进 3D 密集的字幕。 我们提议更多多方向的多方向采矿模型, 以支持生成更多的描述性和全面说明。 从技术上讲, 我们的更多编码将目标关系以渐进的方式对目标进行校外校外校外校外校外校外校外校外, 因为复杂的关系可以从数量有限的基本关系推断出来。 我们首先设计了一个新的空间布局图图图图图集(SLGC), 它将几条第一顺序关系作为3D对象方图的边缘。 我们从拟议的图表中进一步提取多个多层次的三重线, 将基本的第一阶代码关系作为基本单元, 并构建一些目标- 以渐进式的三重目的三重的目录 目标的目录 。 我们的目录图图图图中, 也将我们不断更新的目录- breal- creal- degreal- greal