The existing state-of-the-art point descriptor relies on structure information only, which omit the texture information. However, texture information is crucial for our humans to distinguish a scene part. Moreover, the current learning-based point descriptors are all black boxes which are unclear how the original points contribute to the final descriptor. In this paper, we propose a new multimodal fusion method to generate a point cloud registration descriptor by considering both structure and texture information. Specifically, a novel attention-fusion module is designed to extract the weighted texture information for the descriptor extraction. In addition, we propose an interpretable module to explain the original points in contributing to the final descriptor. We use the descriptor element as the loss to backpropagate to the target layer and consider the gradient as the significance of this point to the final descriptor. This paper moves one step further to explainable deep learning in the registration task. Comprehensive experiments on 3DMatch, 3DLoMatch and KITTI demonstrate that the multimodal fusion descriptor achieves state-of-the-art accuracy and improve the descriptor's distinctiveness. We also demonstrate that our interpretable module in explaining the registration descriptor extraction.
翻译:现有最先进的点描述符仅依赖结构信息, 省略了纹理信息。 然而, 纹理信息对于人类区分场景部分至关重要 。 此外, 目前基于学习的点描述符都是黑盒, 不清楚原始点如何有助于最终描述符。 在本文中, 我们提出一种新的多式聚合方法, 通过考虑结构和纹理信息来生成点云登记描述符 。 具体地说, 设计了一个新式的注意聚合模块, 以提取用于描述解说提取的加权纹理信息 。 此外, 我们提出一个可解释的模块, 解释用于解释对最终描述符贡献的原始点 。 我们用描述符元素作为损失, 向目标层反斜度, 并将梯度作为最后描述符的意义 。 本文向前迈出一步, 以解释注册任务的深层次。 在 3DMatch、 3DLOMatch 和 KITTI 上的全面实验显示, 多式联运描述符理学实现了最新精确度, 并改进了我们可解释的描述符理学模式 。