Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can fail to embed information on the data modalities. This research studies the realistic scenario in which all modalities and class labels are available for model training, but where some modalities and labels required for downstream tasks are missing. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities. We, to counteract these problems, introduce a novel conditional multi-modal discriminative model that uses an informative prior distribution and optimizes a likelihood-free objective function that maximizes mutual information between joint representations and missing modalities. Extensive experimentation demonstrates the benefits of our proposed model, empirical results show that our model achieves state-of-the-art results in representative problems such as downstream classification, acoustic inversion, and image and annotation generation.
翻译:最近利用潜伏变量的深基因模型来学习从多模式数据中得出的联合表述和基因化过程,但这两种学习机制可能相互冲突,而表述可能无法包含数据模式的信息。这项研究研究的是所有模式和类标签都可用于模式培训的现实情景,但缺乏下游任务所需的某些模式和标签。在这种假设中,我们表明变式较低的约束限制了联合表述和缺失模式之间的相互信息。我们为了解决这些问题,采用了一种新的有条件的多模式,使用信息化的事先分配,优化一种无可能性的客观功能,最大限度地扩大联合表述和缺失模式之间的相互信息。广泛实验表明我们拟议模式的好处,实验结果显示我们的模型在下游分类、声学回流、图像和注释生成等具有代表性的问题上取得了最新的结果。