Transformer模块网络用于视觉问答中的系统化泛化 (Transformer Module Networks for Systematic Generalization in Visual Question Answering) - 专知论文

会员服务 ·

0

Performer · 泛化理论 · 变换 · 视觉问答 · Networking ·

2023 年 3 月 17 日

Transformer Module Networks for Systematic Generalization in Visual Question Answering

翻译：Transformer模块网络用于视觉问答中的系统化泛化

Moyuru Yamada,Vanessa D'Amario,Kentaro Takemoto,Xavier Boix,Tomotake Sasaki

Transformers achieve great performance on Visual Question Answering (VQA). However, their systematic generalization capabilities, i.e., handling novel combinations of known concepts, is unclear. We reveal that Neural Module Networks (NMNs), i.e., question-specific compositions of modules that tackle a sub-task, achieve better or similar systematic generalization performance than the conventional Transformers, even though NMNs' modules are CNN-based. In order to address this shortcoming of Transformers with respect to NMNs, in this paper we investigate whether and how modularity can bring benefits to Transformers. Namely, we introduce Transformer Module Network (TMN), a novel NMN based on compositions of Transformer modules. TMNs achieve state-of-the-art systematic generalization performance in three VQA datasets, improving more than 30% over standard Transformers for novel compositions of sub-tasks. We show that not only the module composition but also the module specialization for each sub-task are the key of such performance gain.

翻译：Transformer在视觉问答（VQA）上取得了极佳的成绩。然而，它们的系统化泛化能力，即处理已知概念的新组合，尚不清楚。我们揭示神经模块网络（NMNS）比传统的Transformer取得更好或相似的系统化泛化性能，即使NMNs的模块是基于CNN的。为了解决Transformer与NMNs相比存在的不足，本文研究模块性如何为Transformer带来好处。即，我们引入了Transformer模块网络（TMN），它是基于Transformer模块组成的新型NMN。TMN在三个VQA数据集中实现了最先进的系统化泛化性能，在处理子任务的新组合时比标准Transformer提高了30%以上。我们展示了模块组合以及为每个子任务特殊化的模块都是这种性能提升的关键。

0

相关内容

Performer

最新《计算机视觉领域泛化Domain Generalization》综述论文，18页pdf229篇文献

专知会员服务

57+阅读 · 2021年7月27日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NeurIPS 2020】视觉和语言表示学习的大规模对抗性训练

【NeurIPS 2020】视觉和语言表示学习的大规模对抗性训练

专知会员服务

15+阅读 · 2020年10月27日

注意力图神经网络的小样本学习

注意力图神经网络的小样本学习

专知会员服务

192+阅读 · 2020年7月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

Klotho抑制TRPC6诱导的足细胞损伤在糖尿病肾病中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

自修复且耐疲劳的高强度双网络水凝胶

国家自然科学基金

0+阅读 · 2015年12月31日

Mo(W)-S-Cu原子簇-有机框架材料的设计合成及异相催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

融合地物语义的多尺度对象马尔可夫模型的高分辨率遥感影像分割研究

国家自然科学基金

0+阅读 · 2012年12月31日

非凸稀疏先验图像恢复建模理论和算法

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

新型有机硅接枝聚合物的制备及其用于酸碱蛋白质分离机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于稀疏编码模型的深层学习神经网络

国家自然科学基金

7+阅读 · 2012年12月31日

基于深度学习的异构数据低维非线性表示

国家自然科学基金

1+阅读 · 2012年12月31日

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

Arxiv

1+阅读 · 2023年5月7日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

13+阅读 · 2019年6月13日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

VIP会员

文章信息

相关主题

相关VIP内容

最新《计算机视觉领域泛化Domain Generalization》综述论文，18页pdf229篇文献

专知会员服务

57+阅读 · 2021年7月27日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NeurIPS 2020】视觉和语言表示学习的大规模对抗性训练

【NeurIPS 2020】视觉和语言表示学习的大规模对抗性训练

专知会员服务

15+阅读 · 2020年10月27日

注意力图神经网络的小样本学习

注意力图神经网络的小样本学习

专知会员服务

192+阅读 · 2020年7月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

Arxiv

1+阅读 · 2023年5月7日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

13+阅读 · 2019年6月13日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

相关基金

Klotho抑制TRPC6诱导的足细胞损伤在糖尿病肾病中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

自修复且耐疲劳的高强度双网络水凝胶

国家自然科学基金

0+阅读 · 2015年12月31日

Mo(W)-S-Cu原子簇-有机框架材料的设计合成及异相催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

融合地物语义的多尺度对象马尔可夫模型的高分辨率遥感影像分割研究

国家自然科学基金

0+阅读 · 2012年12月31日

非凸稀疏先验图像恢复建模理论和算法

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

新型有机硅接枝聚合物的制备及其用于酸碱蛋白质分离机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于稀疏编码模型的深层学习神经网络

国家自然科学基金

7+阅读 · 2012年12月31日

基于深度学习的异构数据低维非线性表示

国家自然科学基金

1+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员