Data augmentation is an approach that can effectively improve the performance of multimodal machine learning. This paper introduces a generative model for data augmentation by leveraging the correlations among multiple modalities. Different from conventional data augmentation approaches that apply low level operations with deterministic heuristics, our method proposes to learn an augmentation sampler that generates samples of the target modality conditioned on observed modalities in the variational auto-encoder framework. Additionally, the proposed model is able to quantify the confidence of augmented data by its generative probability, and can be jointly updated with a downstream pipeline. Experiments on Visual Question Answering tasks demonstrate the effectiveness of the proposed generative model, which is able to boost the strong UpDn-based models to the state-of-the-art performance.
翻译:增强数据是能够有效改善多式联运机学习绩效的一种方法。本文介绍了一种通过利用多种模式之间相互关系增加数据的模式。与采用低水平确定性超强操作的传统数据增强方法不同,我们的方法建议学习一种增强型取样器,该取样器生成目标模式样本,该样本以变异自动编码框架的观测模式为条件。此外,拟议的模型能够通过其归因概率量化增强数据的信心,并可以与下游管道联合更新。视觉问题回答实验任务展示了拟议归因型模型的有效性,该模型能够将强大的UpDn型模型提升到最先进的性能。