良好、更好、最佳:通过 " 政策梯度 ",为多选择VQA生成文字抽减器 (Good, Better, Best: Textual Distractors Generation for Multi-Choice VQA via Policy Gradient) - 专知论文

会员服务 ·

0

视觉问答 · MoDELS · Better · Processing（编程语言） · Performer ·

2021 年 3 月 27 日

Good, Better, Best: Textual Distractors Generation for Multi-Choice VQA via Policy Gradient

翻译：良好、更好、最佳:通过 " 政策梯度 ",为多选择VQA生成文字抽减器

Jiaying Lu,Xin Ye,Yi Ren,Yezhou Yang

An increasing amount of studies have investigated the decision-making process of VQA models. Many of these studies focus on the reason behind the correct answer chosen by a model. Yet, the reason why the distracting answer chose by a model has rarely been studied. To this end, we introduce a novel task called \textit{textual Distractors Generation for VQA} (DG-VQA) that explaining the decision boundaries of existing VQA models. The goal of DG-VQA is to generate the most confusing set of textual distractors in multi-choice VQA tasks which expose the vulnerability of existing models (i.e. to generate distractors that lure existing models to fail). We show that DG-VQA can be formulated as a Markov Decision Process, and present a reinforcement learning solution to come up with distractors in an unsupervised manner. The solution addresses the lack of large annotated corpus issues in previous distractor generation methods. Our proposed model receives reward signals from fully-trained multi-choice VQA models and updates its parameters via policy gradient. The empirical results show that the generated textual distractors can successfully attack several popular VQA models with an average $20\%$ accuracy drop from $64\%$. Furthermore, we conduct adversarial training to improve the robustness of VQA models by incorporating the generated distractors. Empirical results validate the effectiveness of adversarial training by showing a performance improvement of $27\%$ for the multi-choice VQA task.

翻译：越来越多的研究调查了VQA模式的决策过程。许多这些研究侧重于一个模型所选正确答案背后的原因。然而,一个模型所选择的转移注意力的答案很少被研究。为此,我们引入了一个叫作\textit{texttractitors Dongering for VQA}(DG-VQA)的新颖任务,解释现有VQA模式的决策界限。DG-VQA的目标是在多选择VQA任务中产生最令人困惑的文本干扰者,这暴露了现有模型的脆弱性(即产生分散注意力者,诱导现有模型失败)。我们显示DG-VQA可以作为Markov的决策过程,提出一种强化的学习解决方案,以不受监督的方式解决现有VQA模式的决策范围。DG-VQA的目标是从经过充分训练的多选择性VQA模型中产生最令人困惑的奖赏信号,并通过政策梯度更新其参数。我们的经验性结果显示,DG-VQQA的快速性变校程模型可以显示,我们通过高性变校程的变校程的变校程模型可以成功地改进。

1

相关内容

视觉问答

视觉问答（Visual Question Answering，VQA），是一种涉及计算机视觉和自然语言处理的学习任务。这一任务的定义如下： A VQA system takes as input an image and a free-form, open-ended, natural-language question about the image and produces a natural-language answer as the output[1]。翻译为中文：一个VQA系统以一张图片和一个关于这张图片形式自由、开放式的自然语言问题作为输入，以生成一条自然语言答案作为输出。简单来说，VQA就是给定的图片进行问答。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

11+阅读 · 2020年4月27日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Neural Architecture Generator Optimization

Arxiv

6+阅读 · 2020年10月8日

Generating Question Relevant Captions to Aid Visual Question Answering

Generating Question Relevant Captions to Aid Visual Question Answering

Arxiv

5+阅读 · 2019年9月9日

Commonsense for Generative Multi-Hop Question Answering Tasks

Arxiv

4+阅读 · 2018年9月17日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Arxiv

6+阅读 · 2018年4月7日

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Arxiv

8+阅读 · 2018年3月29日

A Read-Write Memory Network for Movie Story Understanding

Arxiv

5+阅读 · 2018年3月16日

iVQA: Inverse Visual Question Answering

Arxiv

5+阅读 · 2018年3月16日

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

Arxiv

5+阅读 · 2017年8月25日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

11+阅读 · 2020年4月27日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能行业：2027年AI预测报告

70页pdf《视觉-语言-动作模型综述：一种基于动作离散化的视角》

训练扩散模型其实比你想象的更简单！何恺明团队新作Dispersive Loss：给扩散模型加正则化

【ICML2025】用于可扩展持续强化学习的自组合策略

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Neural Architecture Generator Optimization

Arxiv

6+阅读 · 2020年10月8日

Generating Question Relevant Captions to Aid Visual Question Answering

Generating Question Relevant Captions to Aid Visual Question Answering

Arxiv

5+阅读 · 2019年9月9日

Commonsense for Generative Multi-Hop Question Answering Tasks

Arxiv

4+阅读 · 2018年9月17日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Arxiv

6+阅读 · 2018年4月7日

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Arxiv

8+阅读 · 2018年3月29日

A Read-Write Memory Network for Movie Story Understanding

Arxiv

5+阅读 · 2018年3月16日

iVQA: Inverse Visual Question Answering

Arxiv

5+阅读 · 2018年3月16日

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

Arxiv

5+阅读 · 2017年8月25日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

微信扫码咨询专知VIP会员