Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years, deep generative models, i.e., generative models in which distributions are parameterized by deep neural networks, have attracted much attention, especially variational autoencoders, which are suitable for accomplishing the above challenges because they can consider heterogeneity and infer good representations of data. Therefore, various multimodal generative models based on variational autoencoders, called multimodal deep generative models, have been proposed in recent years. In this paper, we provide a categorized survey of studies on multimodal deep generative models.
翻译:多式联运学习是一个基于不同类型模式作出预测的模型建设框架,多式联运学习的重大挑战是,从任意模式和通过这些模式的跨模式生成的共同表述中推断出共有的表述方式;然而,实现这一点需要考虑到多式联运数据的不同性质。近年来,深层基因化模型,即分配由深层神经网络参数化的基因化模型,引起了许多注意,特别是可变自动调节器,它们适合于完成上述挑战,因为它们可以考虑数据异质性和推断的好表述方式。因此,近年来提出了各种基于变异自动计算机的多式基因化模型,称为多式深层基因化模型。在本文件中,我们对多式深层基因化模型的研究进行了分类调查。