Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena. However, learning the co-representation of diverse modalities is still a long-standing endeavor in emerging machine learning applications and research. Previous generative approaches for multimodal input approximate a joint-modality posterior by uni-modality posteriors as product-of-experts (PoE) or mixture-of-experts (MoE). We argue that these approximations lead to a defective bound for the optimization process and loss of semantic connection among modalities. This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space while handling the missing modality problem. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization. In public datasets of various domains, the experimental results demonstrate that the proposed method is applicable to order-agnostic cross-modal generation while achieving outstanding performance compared to the state-of-the-art multimodal methods. The source code for our method is available online https://anonymous.4open.science/r/SMVAE-9B3C/.
翻译:对多种模式的感知可以产生对现实世界现象的更全面的描述。然而,了解不同模式的共同代表性仍然是在新兴机器学习应用和研究中的一项长期努力。 以往的多式联运投入的基因化方法近似由单一现代后遗物作为专家产品(PoE)或专家混合产品(MoE)联合-现代后遗物(MOE)组成的联合-现代后遗物(MOE)。我们争辩说,这些近似会导致优化过程的缺陷和模式间语义联系的丧失。本文展示了一种新颖的变异方法,称为Set Multial VAE(SVAE),用于在处理缺失的模式问题的同时学习一个多式联运的潜在空间。通过直接模拟联合现代后遗物的分布,拟议的SMVAE学会在多种模式之间交流信息,并弥补因因素化而产生的缺陷。在各个领域的公共数据集中,实验结果显示,拟议的方法适用于秩序-Anosticic 跨模式的生成,同时实现与状态-艺术多式联运方法相比的杰出业绩。我们使用的在线方法的源代码是http://SVARS-9。