Existing multiple modality fusion methods, such as concatenation, summation, and encoder-decoder-based fusion, have recently been employed to combine modality characteristics of Hyperspectral Image (HSI) and Light Detection And Ranging (LiDAR). However, these methods consider the relationship of HSI-LiDAR signals from limited perspectives. More specifically, they overlook the contextual information across modalities of HSI and LiDAR and the intra-modality characteristics of LiDAR. In this paper, we provide a new insight into feature fusion to explore the relationships across HSI and LiDAR modalities comprehensively. An Interconnected Fusion (IF) framework is proposed. Firstly, the center patch of the HSI input is extracted and replicated to the size of the HSI input. Then, nine different perspectives in the fusion matrix are generated by calculating self-attention and cross-attention among the replicated center patch, HSI input, and corresponding LiDAR input. In this way, the intra- and inter-modality characteristics can be fully exploited, and contextual information is considered in both intra-modality and inter-modality manner. These nine interrelated elements in the fusion matrix can complement each other and eliminate biases, which can generate a multi-modality representation for classification accurately. Extensive experiments have been conducted on three widely used datasets: Trento, MUUFL, and Houston. The IF framework achieves state-of-the-art results on these datasets compared to existing approaches.
翻译:现有的多模式融合方法,例如连接、求和和编码器-解码器融合等,最近已被用于结合高光谱图像(HSI)和激光雷达(LiDAR)的模态特征。然而,这些方法仅从有限的角度考虑了HSI-LiDAR信号之间的关系。具体来说,它们忽略了HSI和LiDAR模态之间的上下文信息以及LiDAR的内模态特征。在本文中,我们提供了一种新的特征融合方法,以全面探索HSI和LiDAR模态之间的关系。提出了一种交互融合(IF)框架。首先,提取HSI输入的中心补丁并将其复制到HSI输入的尺寸。然后,通过计算复制的中心补丁、HSI输入和相应的LiDAR输入之间的自我注意力和交叉注意力,生成融合矩阵中的九个不同的视角。以此方式,可以充分利用内部和间部的特征,并在内部和间部的方式上考虑上下文信息。这个融合矩阵中的九个相互关联的元素可以互相补充和消除偏差,从而生成一个准确的多模态表示。在三个广泛使用的数据集:Trento、MUUFL和Houston上进行了广泛的实验。IF框架在这些数据集上相比现有方法取得了最先进的结果。