Part2Word:通过将部分与单词匹配,学习如何将点云和文字联合嵌入 (Part2Word: Learning Joint Embedding of Point Clouds and Text by Matching Parts to Words)

It is important to learn joint embedding for 3D shapes and text in different shape understanding tasks, such as shape-text matching, retrieval, and shape captioning. Current multi-view based methods learn a mapping from multiple rendered views to text. However, these methods can not analyze 3D shapes well due to the self-occlusion and limitation of learning manifolds. To resolve this issue, we propose a method to learn joint embedding of point clouds and text by matching parts from shapes to words from sentences in a common space. Specifically, we first learn segmentation prior to segment point clouds into parts. Then, we map parts and words into an optimized space, where the parts and words can be matched with each other. In the optimized space, we represent a part by aggregating features of all points within the part, while representing each word with its context information, where we train our network to minimize the triplet ranking loss. Moreover, we also introduce cross-modal attention to capture the relationship of part-word in this matching procedure, which enhances joint embedding learning. Our experimental results outperform the state-of-the-art in multi-modal retrieval under the widely used benchmark.

翻译：学习 3D 形状和文本在不同的理解任务中学习 3D 形状和文本的共同嵌入很重要, 比如形状- 文本匹配、检索和形状字幕。当前基于多视图的方法从多重的视图到文本学习绘图。但是, 由于学习的多元的自我封闭和限制, 这些方法无法很好地分析 3D 形状。为了解决这个问题, 我们提出了一个方法, 通过将 3D 云和文本从形状到共同的句子的部分匹配来学习。具体地, 我们首先在分点云到部件部分之前学习的分数。然后, 我们绘制部分和文字进入一个优化的空间。在优化的空间。在优化区域, 我们中, 我们代表部分部分部分的部分部分代表部分部分部分, 通过, 通过集合将全部的集合和中的集合, 和的, 和和和的的的组合在和的组合在中的的的, 在中的, 在的中的, 在使用上在通用基准下在中中的中的中的的的的的的, 我们, 我们, 我们匹配的的的的的校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内校内