理解视觉推理背后的计算要求 (Understanding the computational demands underlying visual reasoning) - 专知论文

会员服务 ·

0

可理解性 · Taxonomy · 注意力机制 · Networking · 学成 ·

2021 年 8 月 8 日

Understanding the computational demands underlying visual reasoning

翻译：理解视觉推理背后的计算要求

Mohit Vaishnav,Remi Cadene,Andrea Alamia,Drew Linsley,Rufin VanRullen,Thomas Serre

Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability of modern deep convolutional neural networks (CNNs) to learn to solve the Synthetic Visual Reasoning Test (SVRT) challenge, a collection of twenty-three visual reasoning problems. Our analysis leads to a novel taxonomy of visual reasoning tasks, which can be primarily explained by both the type of relations (same-different vs. spatial-relation judgments) and the number of relations used to compose the underlying rules. Prior cognitive neuroscience work suggests that attention plays a key role in human's visual reasoning ability. To test this, we extended the CNNs with spatial and feature-based attention mechanisms. In a second series of experiments, we evaluated the ability of these attention networks to learn to solve the SVRT challenge and found the resulting architectures to be much more efficient at solving the hardest of these visual reasoning tasks. Most importantly, the corresponding improvements on individual tasks partially explained the taxonomy. Overall, this work advances our understanding of visual reasoning and yields testable Neuroscience predictions regarding the need for feature-based vs. spatial attention in visual reasoning.

翻译：视觉理解要求理解一个场景中天体之间复杂的视觉关系。在这里, 我们试图描述抽象视觉推理的计算要求。我们这样做的方法是系统地评估现代深层神经神经神经网络(CNNs)学习解决合成视觉理性测试(SVRT)挑战的能力。共收集了23个视觉推理问题。我们的分析导致视觉推理任务的新分类, 这主要可以用关系类型( 相同差异相对于空间关系判断) 和用于构建基本规则的关系数量来解释。先前的认知神经科学研究表明, 关注在人类视觉推理能力中发挥着关键作用。为了测试这一点, 我们用空间和基于特征的注意机制扩大了CNNs。在第二系列实验中, 我们评估了这些关注网络学习解决SVRT挑战的能力, 发现由此产生的结构在解决这些视觉推理任务中最困难方面的效率要高得多。最重要的是, 个人任务的相应改进部分解释了关于税收的需要。总体而言, 这项工作增进了我们对视觉推理学和可测试性神经学预测中的视觉推理学的理解。

0

相关内容

可理解性

智源发布！《人工智能的认知神经基础白皮书》，55页pdf

智源发布！《人工智能的认知神经基础白皮书》，55页pdf

专知会员服务

90+阅读 · 2021年3月10日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

已删除

将门创投

5+阅读 · 2017年11月20日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Coarse-to-Fine Reasoning for Visual Question Answering

Arxiv

1+阅读 · 2021年10月6日

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Arxiv

6+阅读 · 2020年3月10日

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Arxiv

5+阅读 · 2018年11月1日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

Visual Question Reasoning on General Dependency Tree

Arxiv

6+阅读 · 2018年3月31日

Iterative Visual Reasoning Beyond Convolutions

Arxiv

3+阅读 · 2018年3月29日

A Read-Write Memory Network for Movie Story Understanding

Arxiv

5+阅读 · 2018年3月16日

A dataset and architecture for visual reasoning with a working memory

Arxiv

3+阅读 · 2018年3月16日

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

Arxiv

6+阅读 · 2018年3月14日

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

Arxiv

6+阅读 · 2018年2月12日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

智源发布！《人工智能的认知神经基础白皮书》，55页pdf

智源发布！《人工智能的认知神经基础白皮书》，55页pdf

专知会员服务

90+阅读 · 2021年3月10日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

已删除

将门创投

5+阅读 · 2017年11月20日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Coarse-to-Fine Reasoning for Visual Question Answering

Arxiv

1+阅读 · 2021年10月6日

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Arxiv

6+阅读 · 2020年3月10日

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Arxiv

5+阅读 · 2018年11月1日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

Visual Question Reasoning on General Dependency Tree

Arxiv

6+阅读 · 2018年3月31日

Iterative Visual Reasoning Beyond Convolutions

Arxiv

3+阅读 · 2018年3月29日

A Read-Write Memory Network for Movie Story Understanding

Arxiv

5+阅读 · 2018年3月16日

A dataset and architecture for visual reasoning with a working memory

Arxiv

3+阅读 · 2018年3月16日

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

Arxiv

6+阅读 · 2018年3月14日

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

Arxiv

6+阅读 · 2018年2月12日

微信扫码咨询专知VIP会员