Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image. Through cross-dataset adaptation methods, it is possible to transfer knowledge from a source dataset with larger train samples to a target dataset where training set is limited. Suppose a VQA model trained on one dataset train set fails in adapting to another, it is hard to identify the underlying cause of domain mismatch as there could exists a multitude of reasons such as image distribution mismatch and question distribution mismatch. At UCLA, we are working on a VQG module that facilitate in automatically generating OOD shifts that aid in systematically evaluating cross-dataset adaptation capabilities of VQA models.
翻译:视觉问题解答( VQA) 是回答关于输入图像的自然语言问题的多模式任务。 通过交叉数据集适应方法,有可能将知识从拥有较大列车样本的源数据集转移到培训数据集有限的目标数据集。 假设在一个数据集列上培训的 VQA 模型无法适应另一个数据集, 很难确定域错配的根本原因, 因为可能存在图像分布错配和问题分布错配等多种原因。 在UCLA, 我们正在开发一个 VQG 模块, 该模块有助于自动生成 OOD 转换, 帮助系统评估 VQA 模型的交叉数据集适应能力。