Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can fail to embed information on the data modalities. This research studies the realistic scenario in which all modalities and class labels are available for model training, but where some modalities and labels required for downstream tasks are missing. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities. We, to counteract these problems, introduce a novel conditional multi-modal discriminative model that uses an informative prior distribution and optimizes a likelihood-free objective function that maximizes mutual information between joint representations and missing modalities. Extensive experimentation shows the benefits of the model we propose, the empirical results showing that our model achieves state-of-the-art results in representative problems such as downstream classification, acoustic inversion and annotation generation.
翻译:最近利用潜伏变量的深基因模型从多模式数据中学习联合表述和基因化过程,但这两种学习机制可能相互冲突,而且表示可能无法包含数据模式的信息。这项研究研究的是所有模式和类标签都可用于模式培训的现实情景,但缺乏下游任务所需的某些模式和标签。我们在此假设中表明,变式较低约束限制了联合表述和缺失模式之间的相互信息。为了解决这些问题,我们引入了一种新的有条件的多模式,使用信息化的事先分发,优化一种无可能性客观功能,最大限度地增加联合表述和缺失模式之间的相互信息。广泛的实验展示了我们提出的模式的好处,实验结果显示,我们的模型在下游分类、声学读和注解代等有代表性的问题上取得了最新的结果。