Multimodal emotion recognition leverages complementary information across modalities to gain performance. However, we cannot guarantee that the data of all modalities are always present in practice. In the studies to predict the missing data across modalities, the inherent difference between heterogeneous modalities, namely the modality gap, presents a challenge. To address this, we propose to use invariant features for a missing modality imagination network (IF-MMIN) which includes two novel mechanisms: 1) an invariant feature learning strategy that is based on the central moment discrepancy (CMD) distance under the full-modality scenario; 2) an invariant feature based imagination module (IF-IM) to alleviate the modality gap during the missing modalities prediction, thus improving the robustness of multimodal joint representation. Comprehensive experiments on the benchmark dataset IEMOCAP demonstrate that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions. We release the code at: https://github.com/ZhuoYulang/IF-MMIN.
翻译:然而,我们无法保证所有模式的数据总是在实际中存在。在预测不同模式之间缺失的数据的研究中,不同模式之间的内在差异,即模式差距,是一个挑战。为了解决这个问题,我们提议对缺少模式想象力网络(IF-MMIN)使用差异性特征,其中包括两个新机制:(1) 一种基于全模式情景下核心时位差异(CMD)距离的不变性特征学习战略;(2) 一种基于差异的基于差异的想象力模块(IF-IM),以在缺失模式预测期间缩小模式差距,从而提高多式联运联合代表的稳健性。关于IMOCAP基准数据集的全面实验表明,拟议的模型超越了所有基线,在不确定的缺失模式条件下,不断改进总体情感识别表现。我们在https://github.com/ZhuoYulang/IF-MINMIN中发布了代码。