Multimodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning. One common practice is to adopt a well-performed multimodal network as the teacher in the hope that it can transfer its full knowledge to a unimodal student for performance improvement. In this paper, we investigate the efficacy of multimodal KD. We begin by providing two failure cases of it and demonstrate that KD is not a universal cure in multimodal knowledge transfer. We present the modality Venn diagram to understand modality relationships and the modality focusing hypothesis revealing the decisive factor in the efficacy of multimodal KD. Experimental results on 6 multimodal datasets help justify our hypothesis, diagnose failure cases, and point directions to improve distillation performance.
翻译:多式知识蒸馏(KD)将传统知识蒸馏推广到多式学习领域,一种常见做法是采用表现良好的多式联运网络作为教师,希望能够将其全部知识传授给一名单式学生,以便提高绩效。在本文件中,我们调查了多式知识蒸馏(KD)的功效。我们首先提供两起失败案例,并表明KD不是多式联运知识转让的普遍解决办法。我们提出了范恩图,以了解模式关系和侧重于假设的模式,这些假设揭示了多式联运KD有效性的决定性因素。 6个多式数据集的实验结果有助于证明我们的假设、诊断失败案例以及改进蒸馏绩效的点方向是正确的。