Data augmentation has been shown to effectively improve the performance of multimodal machine learning models. This paper introduces a generative model for data augmentation by leveraging the correlations among multiple modalities. Different from conventional data augmentation approaches that apply low-level operations with deterministic heuristics, our method learns a generator that generates samples of the target modality conditioned on observed modalities in the variational auto-encoder framework. Additionally, the proposed model is able to quantify the confidence of augmented data by its generative probability, and can be jointly optimised with a downstream task. Experiments on Visual Question Answering as downstream task demonstrate the effectiveness of the proposed generative model, which is able to improve strong UpDn-based models to achieve state-of-the-art performance.
翻译:事实证明,数据增强可以有效地改善多式联运机床学习模型的性能,本文件通过利用多种模式之间的相互关系,引入了数据增强的基因模型。不同于传统的数据增强方法,即采用具有确定性超强度的低水平操作,我们的方法学习了一种生成目标模式样本的生成者,该样本以变式自动编码框架的观测模式为条件。此外,拟议的模型能够通过其基因概率量化数据增强数据的信心,并可以与下游任务共同优化。视觉问题回答实验作为下游任务展示了拟议的基因增强模型的有效性,该模型能够改进强大的基于UpDn的模型,以实现最先进的性能。