解决视觉原因问题远景变异器 (Recurrent Vision Transformer for Solving Visual Reasoning Problems) - 专知论文

会员服务 ·

0

Vision · 变换 · Networking · 注意力机制 · 卷积神经网络 ·

2021 年 11 月 29 日

Recurrent Vision Transformer for Solving Visual Reasoning Problems

翻译：解决视觉原因问题远景变异器

Nicola Messina,Giuseppe Amato,Fabio Carrara,Claudio Gennaro,Fabrizio Falchi

Although convolutional neural networks (CNNs) showed remarkable results in many vision tasks, they are still strained by simple yet challenging visual reasoning problems. Inspired by the recent success of the Transformer network in computer vision, in this paper, we introduce the Recurrent Vision Transformer (RViT) model. Thanks to the impact of recurrent connections and spatial attention in reasoning tasks, this network achieves competitive results on the same-different visual reasoning problems from the SVRT dataset. The weight-sharing both in spatial and depth dimensions regularizes the model, allowing it to learn using far fewer free parameters, using only 28k training samples. A comprehensive ablation study confirms the importance of a hybrid CNN + Transformer architecture and the role of the feedback connections, which iteratively refine the internal representation until a stable prediction is obtained. In the end, this study can lay the basis for a deeper understanding of the role of attention and recurrent connections for solving visual abstract reasoning tasks.

翻译：虽然在很多视觉任务中,共生神经网络(CNNs)取得了显著成果,但它们仍然受到简单而富有挑战性的视觉推理问题的困扰。在计算机视觉中变换器网络最近取得成功的启发下,我们在本论文中引入了经常的视野变换器(RVVT)模型。由于反复连接的影响和推理任务中的空间关注,这个网络在SVRT数据集的相同视觉推理问题上取得了竞争性结果。空间和深度的权重共享使模型规范化,允许它使用少得多的自由参数学习,只使用28k个培训样本。全面融和研究证实了CNN+变换器混合结构的重要性和反馈连接的作用,这些结构反复完善了内部代表性,直到获得稳定的预测。最后,这一研究可以奠定基础,更深入地了解关注作用和反复连接对于解决视觉抽象推理任务的作用。

0

相关内容

Vision

【ICML2021】蛋白质语言模型-MSA Transformer

专知会员服务

34+阅读 · 2021年8月16日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【机器推理可解释性】Machine Reasoning Explainability

【机器推理可解释性】Machine Reasoning Explainability

专知会员服务

35+阅读 · 2020年9月3日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【理解计算机视觉损失函数】《Understanding Loss Functions in Computer Vision!》by Sowmya Yellapragad

【理解计算机视觉损失函数】《Understanding Loss Functions in Computer Vision!》by Sowmya Yellapragad

专知会员服务

44+阅读 · 2020年3月4日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

Attention Mechanisms in Computer Vision: A Survey

Arxiv

58+阅读 · 2021年11月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

12+阅读 · 2020年8月11日

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

Arxiv

4+阅读 · 2019年12月3日

Recurrent Event Network for Reasoning over Temporal Knowledge Graphs

Recurrent Event Network for Reasoning over Temporal Knowledge Graphs

Arxiv

6+阅读 · 2019年6月4日

A dataset and architecture for visual reasoning with a working memory

Arxiv

3+阅读 · 2018年3月16日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

VIP会员

文章信息

相关主题

注意力机制

卷积神经网络

相关VIP内容

【ICML2021】蛋白质语言模型-MSA Transformer

专知会员服务

34+阅读 · 2021年8月16日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【机器推理可解释性】Machine Reasoning Explainability

【机器推理可解释性】Machine Reasoning Explainability

专知会员服务

35+阅读 · 2020年9月3日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【理解计算机视觉损失函数】《Understanding Loss Functions in Computer Vision!》by Sowmya Yellapragad

【理解计算机视觉损失函数】《Understanding Loss Functions in Computer Vision!》by Sowmya Yellapragad

专知会员服务

44+阅读 · 2020年3月4日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

构建军事人工智能信任体系始于破除黑盒机制

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

Attention Mechanisms in Computer Vision: A Survey

Arxiv

58+阅读 · 2021年11月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

12+阅读 · 2020年8月11日

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

Arxiv

4+阅读 · 2019年12月3日

Recurrent Event Network for Reasoning over Temporal Knowledge Graphs

Recurrent Event Network for Reasoning over Temporal Knowledge Graphs

Arxiv

6+阅读 · 2019年6月4日

A dataset and architecture for visual reasoning with a working memory

Arxiv

3+阅读 · 2018年3月16日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

微信扫码咨询专知VIP会员