The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Many previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to aid in training a robust target model. However, these methods compute the bias for a model from the label statistics of the training data or directly from single modal branches. In contrast, in this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model \emph{directly from the target model}, called GenB. In particular, GenB employs a generative network to learn the bias through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE.
翻译:视觉问题解答(VQA)的任务已知受到VQA模型问题的困扰,它利用了数据集内的偏见作出最后预测。许多以前基于混合的贬低方法都曾提出过,因为另一个模型有意地训练成偏向,以协助培训一个强有力的目标模型。然而,这些方法从培训数据的标签统计或直接从单一模式分支计算出模型的偏向。与此相反,在这项工作中,为了更好地了解目标VQA模型受到的偏向,我们建议一种基因化方法来培训偏向模型\emph{直接从目标模型},特别是称为GenB。GenB使用基因化网络,通过对立目标和知识蒸馏相结合来了解偏向。我们然后用GenB作为偏向模型,通过广泛的实验显示我们的方法对VQA-CP2、VQA-CP1、GA-OOOD和VQA-CE。