AdaVQA: 克服调整后边缘余量损失的语言前期语言 (AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss) - 专知论文

会员服务 ·

0

视觉问答 · 特征空间 · 余弦 · 边缘化 · MoDELS ·

2021 年 5 月 5 日

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

翻译：AdaVQA: 克服调整后边缘余量损失的语言前期语言

Yangyang Guo,Liqiang Nie,Zhiyong Cheng,Feng Ji,Ji Zhang,Alberto Del Bimbo

A number of studies point out that current Visual Question Answering (VQA) models are severely affected by the language prior problem, which refers to blindly making predictions based on the language shortcut. Some efforts have been devoted to overcoming this issue with delicate models. However, there is no research to address it from the angle of the answer feature space learning, despite of the fact that existing VQA methods all cast VQA as a classification task. Inspired by this, in this work, we attempt to tackle the language prior problem from the viewpoint of the feature space learning. To this end, an adapted margin cosine loss is designed to discriminate the frequent and the sparse answer feature space under each question type properly. As a result, the limited patterns within the language modality are largely reduced, thereby less language priors would be introduced by our method. We apply this loss function to several baseline models and evaluate its effectiveness on two VQA-CP benchmarks. Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15\% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.

翻译：一些研究指出,目前的视觉问题解答模式受到先前语言问题(即盲目地根据语言快捷键作出预测)的严重影响,一些努力致力于用微妙的模式克服这一问题,然而,尽管现有的视觉问题解答模式方法都使VQA成为分类任务,但没有从答案空间学习的视角研究解决这一问题,因此,在这项工作的启发下,我们试图从特征空间学习的角度处理先前语言问题。为此,调整的余弦调损失旨在适当区分每个问题类型下的频繁和稀少的回答特征空间。因此,语言模式中有限的模式大为减少,因此,我们的方法将减少先前的语言。我们将这一损失功能应用于几个基线模型,并评估其在两个VQA-CP基准上的有效性。实验结果表明,我们调整的差量损失能够大大加强基线模型,平均取得绝对性能收益15 ⁇,从回答空间特征的角度有力地核查解决VQA先前语言问题的可能性。

0

相关内容

视觉问答

视觉问答（Visual Question Answering，VQA），是一种涉及计算机视觉和自然语言处理的学习任务。这一任务的定义如下： A VQA system takes as input an image and a free-form, open-ended, natural-language question about the image and produces a natural-language answer as the output[1]。翻译为中文：一个VQA系统以一张图片和一个关于这张图片形式自由、开放式的自然语言问题作为输入，以生成一条自然语言答案作为输出。简单来说，VQA就是给定的图片进行问答。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

【SIGIR 2020】基于协同注意力机制的知识增强推荐模型

【SIGIR 2020】基于协同注意力机制的知识增强推荐模型

专知会员服务

91+阅读 · 2020年7月23日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

【牛津DeepMind】从Word2Vec到BERT:上下文嵌入(Contextual Embeddings)综述论文

【牛津DeepMind】从Word2Vec到BERT:上下文嵌入(Contextual Embeddings)综述论文

专知会员服务

85+阅读 · 2020年3月18日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

【多语言模型跨任务嵌入投影】Multilingual model using cross-task embedding projection，CoNLL 2019，附论文和PPT免费下载

【多语言模型跨任务嵌入投影】Multilingual model using cross-task embedding projection，CoNLL 2019，附论文和PPT免费下载

专知会员服务

10+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇视觉问答（VQA）相关论文—融合算子、问题类型引导注意力、交互环境、可解释性、稠密对称联合注意力

【论文推荐】最新七篇视觉问答（VQA）相关论文—融合算子、问题类型引导注意力、交互环境、可解释性、稠密对称联合注意力

专知

7+阅读 · 2018年4月20日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

专知

6+阅读 · 2018年2月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Masked Proxy Loss For Text-Independent Speaker Verification

Arxiv

0+阅读 · 2021年6月25日

RikoNet: A Novel Anime Recommendation Engine

RikoNet: A Novel Anime Recommendation Engine

Arxiv

0+阅读 · 2021年6月24日

Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations

Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations

Arxiv

1+阅读 · 2021年6月23日

Closed-Form, Provable, and Robust PCA via Leverage Statistics and Innovation Search

Arxiv

0+阅读 · 2021年6月23日

Multi-Modal Answer Validation for Knowledge-Based VQA

Arxiv

6+阅读 · 2021年3月23日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation

Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation

Arxiv

8+阅读 · 2021年1月5日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

VIP会员

文章信息

相关主题

相关VIP内容

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

【SIGIR 2020】基于协同注意力机制的知识增强推荐模型

【SIGIR 2020】基于协同注意力机制的知识增强推荐模型

专知会员服务

91+阅读 · 2020年7月23日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

【牛津DeepMind】从Word2Vec到BERT:上下文嵌入(Contextual Embeddings)综述论文

【牛津DeepMind】从Word2Vec到BERT:上下文嵌入(Contextual Embeddings)综述论文

专知会员服务

85+阅读 · 2020年3月18日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

【多语言模型跨任务嵌入投影】Multilingual model using cross-task embedding projection，CoNLL 2019，附论文和PPT免费下载

【多语言模型跨任务嵌入投影】Multilingual model using cross-task embedding projection，CoNLL 2019，附论文和PPT免费下载

专知会员服务

10+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇视觉问答（VQA）相关论文—融合算子、问题类型引导注意力、交互环境、可解释性、稠密对称联合注意力

【论文推荐】最新七篇视觉问答（VQA）相关论文—融合算子、问题类型引导注意力、交互环境、可解释性、稠密对称联合注意力

专知

7+阅读 · 2018年4月20日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

专知

6+阅读 · 2018年2月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Masked Proxy Loss For Text-Independent Speaker Verification

Arxiv

0+阅读 · 2021年6月25日

RikoNet: A Novel Anime Recommendation Engine

RikoNet: A Novel Anime Recommendation Engine

Arxiv

0+阅读 · 2021年6月24日

Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations

Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations

Arxiv

1+阅读 · 2021年6月23日

Closed-Form, Provable, and Robust PCA via Leverage Statistics and Innovation Search

Arxiv

0+阅读 · 2021年6月23日

Multi-Modal Answer Validation for Knowledge-Based VQA

Arxiv

6+阅读 · 2021年3月23日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation

Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation

Arxiv

8+阅读 · 2021年1月5日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

微信扫码咨询专知VIP会员