Existing multimodal tasks mostly target at the complete input modality setting, i.e., each modality is either complete or completely missing in both training and test sets. However, the randomly missing situations have still been underexplored. In this paper, we present a novel approach named MM-Align to address the missing-modality inference problem. Concretely, we propose 1) an alignment dynamics learning module based on the theory of optimal transport (OT) for indirect missing data imputation; 2) a denoising training algorithm to simultaneously enhance the imputation results and backbone network performance. Compared with previous methods which devote to reconstructing the missing inputs, MM-Align learns to capture and imitate the alignment dynamics between modality sequences. Results of comprehensive experiments on three datasets covering two multimodal tasks empirically demonstrate that our method can perform more accurate and faster inference and relieve overfitting under various missing conditions.
翻译:现有的多式联运任务主要针对完整的投入模式设置,即,在培训和测试组合中,每种模式要么完整,要么完全缺失,要么完全缺失;然而,随机缺失的情况仍未得到充分探讨;在本文件中,我们提出了一个名为MM-Align的新方法,以解决缺失的现代推论问题;具体地说,我们提议:(1) 基于对间接缺失数据估算的最佳运输理论(OT)的调整动态学习模块;(2) 用于同时提高估算结果和主干网性能的取消污染培训算法;与以往用于重建缺失投入的方法相比,MM-Align学会捕捉和模仿模式序列之间的调整动态;关于三个数据集的全面实验的结果,涵盖两种多式联运任务,经验证明我们的方法可以更准确和更快地推断,并在各种缺失的条件下减轻过度使用。