e-VIL:视觉语言任务中自然语言解释的数据集和基准 (e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks)

Recently, there has been an increasing number of efforts to introduce models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. Such models are appealing, because they can provide human-friendly and comprehensive explanations. However, there is a lack of comparison between existing methods, which is due to a lack of re-usable evaluation frameworks and a scarcity of datasets. In this work, we introduce e-ViL and e-SNLI-VE. e-ViL is a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks. It spans four models and three datasets and both automatic metrics and human evaluation are used to assess model-generated explanations. e-SNLI-VE is currently the largest existing VL dataset with NLEs (over 430k instances). We also propose a new model that combines UNITER, which learns joint embeddings of images and text, and GPT-2, a pre-trained language model that is well-suited for text generation. It surpasses the previous state of the art by a large margin across all datasets. Code and data are available here: https://github.com/maximek3/e-ViL.

翻译：最近,为引入能够产生自然语言解释的模型(NLEs)来预测视觉语言任务(VL),最近做出了越来越多的努力,以引入能够产生自然语言解释的模型(NLEs),这些模型具有吸引力,因为它们能够提供人类友好和全面的解释;然而,由于缺少可再使用的评价框架和数据集稀缺,现有方法之间缺乏可比性。在这项工作中,我们引入了e-Vil和e-SNLI-VE。e-VIL(e-VI)是可解释的愿景语言任务的基准,可以建立一个统一的评价框架,并对现有方法进行第一次综合比较,为VLL任务产生NLEs。这些模型跨越了四个模型和三个数据集,自动指标和人类评价用于评估模型的解释。e-SNLIL-VE(e-SNLI-VE)是目前与NLES(超过430k实例)最大的VL数据集。我们还提出了一个新的模型,将UNITER(即学习图像和文本联合嵌入)和GPT-2(GPT)结合起来,一个经过事先培训的语言模型,可用于文本生成的模型,可以很好地用于生成文本数据。它超越了M/VIRCmaxium/VI)所有版本的数据。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/