Data Augmentation (DA) -- generating extra training samples beyond original training set -- has been widely-used in today's unbiased VQA models to mitigate the language biases. Current mainstream DA strategies are synthetic-based methods, which synthesize new samples by either editing some visual regions/words, or re-generating them from scratch. However, these synthetic samples are always unnatural and error-prone. To avoid this issue, a recent DA work composes new augmented samples by randomly pairing pristine images and other human-written questions. Unfortunately, to guarantee augmented samples have reasonable ground-truth answers, they manually design a set of heuristic rules for several question types, which extremely limits its generalization abilities. To this end, we propose a new Knowledge Distillation based Data Augmentation for VQA, dubbed KDDAug. Specifically, we first relax the requirements of reasonable image-question pairs, which can be easily applied to any question types. Then, we design a knowledge distillation (KD) based answer assignment to generate pseudo answers for all composed image-question pairs, which are robust to both in-domain and out-of-distribution settings. Since KDDAug is a model-agnostic DA strategy, it can be seamlessly incorporated into any VQA architectures. Extensive ablation studies on multiple backbones and benchmarks have demonstrated the effectiveness and generalization abilities of KDDAug.
翻译:数据增强(DA) -- -- 生成原始培训数据集以外的额外培训样本 -- -- 在今天的公正 VQA 模型中广泛使用 -- -- 生成超出原始培训数据集的额外培训样本 -- -- 来减轻语言偏见。目前的主流DA战略是以合成为基础的方法,通过编辑某些视觉区域/词或从零开始再生成来合成新样本。然而,这些合成样本总是不自然的,容易出错。为了避免这一问题,最近的一项DA工作通过随机对齐纯图像和其他人类写问题来组成新的增强样本。不幸的是,为了保证增加样本有合理的地面真相答案,他们手工设计了一套针对若干问题类型的超自然规则,这极其限制了其概括化能力。为此,我们提议为VQA编辑基于数据增强数据的新的知识提炼法,称为KDADAugug。我们首先放宽合理图像问题配对的要求,这很容易适用于任何类型的问题。然后,我们设计基于知识更新(KDD)的基调解答,为所有构成图像配对制作假的假答案,这些模型对DAA的模型都是坚固的,这些模型是硬性的,对DADRA的和外部结构的模拟。自一个演示结构。自KA的模型是一个演示的模型,可以被演成的模型。