VisQA:变换器中的X光视野和语言理由 (VisQA: X-raying Vision and Language Reasoning in Transformers)

Visual Question Answering systems target answering open-ended textual questions given input images. They are a testbed for learning high-level reasoning with a primary use in HCI, for instance assistance for the visually impaired. Recent research has shown that state-of-the-art models tend to produce answers exploiting biases and shortcuts in the training data, and sometimes do not even look at the input image, instead of performing the required reasoning steps. We present VisQA, a visual analytics tool that explores this question of reasoning vs. bias exploitation. It exposes the key element of state-of-the-art neural models -- attention maps in transformers. Our working hypothesis is that reasoning steps leading to model predictions are observable from attention distributions, which are particularly useful for visualization. The design process of VisQA was motivated by well-known bias examples from the fields of deep learning and vision-language reasoning and evaluated in two ways. First, as a result of a collaboration of three fields, machine learning, vision and language reasoning, and data analytics, the work lead to a direct impact on the design and training of a neural model for VQA, improving model performance as a consequence. Second, we also report on the design of VisQA, and a goal-oriented evaluation of VisQA targeting the analysis of a model decision process from multiple experts, providing evidence that it makes the inner workings of models accessible to users.

翻译：视觉问题解答系统的目标是解答开放式文本问题,提供输入图像。它们是学习高层次推理的试金石,主要用于HCI,例如帮助视力受损者。最近的研究表明,最先进的模型往往利用培训数据中的偏见和捷径来提供答案,有时甚至不看输入图像,而不是执行所需的推理步骤。我们展示了VisQA,这是一个视觉分析工具,探讨这种可理解的推理与偏见利用问题。它揭示了最先进的神经模型的关键要素 -- -- 变异器中的注意图。我们的工作假设是,导致模型预测的推理步骤从注意力分布中观察出来,对于直观化特别有用。 VisQA的设计过程的动机是深层次学习和视觉语言推理领域的众所周知的偏见例子,而没有以两种方式进行评估。首先,通过三个领域的合作,机器学习、视觉和语言推理以及数据分析,工作导致对模型的直接影响,模型的设计以及我们面向设计的目标分析的第二个模型。A 一种面向性模型的模型,也是我们面向设计结果的模型的模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

专知会员服务

37+阅读 · 2020年4月10日