Multimodal Variational Autoencoders (VAEs) have been a subject of intense research in the past years as they can integrate multiple modalities into a joint representation and can thus serve as a promising tool for both data classification and generation. Several approaches toward multimodal VAE learning have been proposed so far, their comparison and evaluation have however been rather inconsistent. One reason is that the models differ at the implementation level, another problem is that the datasets commonly used in these cases were not initially designed for the evaluation of multimodal generative models. This paper addresses both mentioned issues. First, we propose a toolkit for systematic multimodal VAE training and comparison. Second, we present a synthetic bimodal dataset designed for a comprehensive evaluation of the joint generation and cross-generation capabilities. We demonstrate the utility of the dataset by comparing state-of-the-art models.
翻译:过去几年来,多式多式自动转换器(VAE)一直是一项密集研究的主题,因为它们可以将多种模式纳入共同代表制,从而可以作为数据分类和生成的一个很有希望的工具。迄今为止,已经提出了多种多式自动转换器学习方法,但比较和评价却相当不一致。一个原因是,在执行层面,这些模型存在差异,另一个问题是,这些案例中常用的数据集最初不是用来评价多式联运基因化模型的。本文涉及上述两个问题。首先,我们提出了一套系统化的多式自动转换器培训和比较工具包。第二,我们提出了一个合成双式数据集,旨在全面评估联合生成和交叉生成能力。我们通过比较最新模型来展示数据集的效用。