We propose a novel method, Modality-based Redundancy Reduction Fusion (MRRF), for understanding and modulating the relative contribution of each modality in multimodal inference tasks. This is achieved by obtaining an $(M+1)$-way tensor to consider the high-order relationships between $M$ modalities and the output layer of a neural network model. Applying a modality-based tensor factorization method, which adopts different factors for different modalities, results in removing information present in a modality that can be compensated by other modalities, with respect to model outputs. This helps to understand the relative utility of information in each modality. In addition it leads to a less complicated model with less parameters and therefore could be applied as a regularizer avoiding overfitting. We have applied this method to three different multimodal datasets in sentiment analysis, personality trait recognition, and emotion recognition. We are able to recognize relationships and relative importance of different modalities in these tasks and achieves a 1\% to 4\% improvement on several evaluation measures compared to the state-of-the-art for all three tasks.
翻译:我们提出了一种新方法,基于特征的冗余降维融合(MRRF)用于理解和调节多模态推理任务中每种模态的相对贡献。这是通过获取一个(M+1)向量张量来实现的,以考虑$M$模态和神经网络模型的输出层之间的高阶关系。应用基于特征的张量分解方法,对不同的模态采用不同的因子,从而消除可以由其他模态补偿的模态中存在的信息,这有助于了解每种模态的相对实用性。此外,它可以导致一个参数更少的较简单的模型,因此可以应用作正则化器,避免过拟合。我们已经将这种方法应用于情感分析,个性特征识别和情感识别中的三个不同的多模态数据集。我们能够识别这些任务中不同模态之间的关系和相对重要性,并且在所有三个任务中相比于现有技术,结果都取得了1\%至4\%的提高。