变换驱动视觉理由 (Transformation Driven Visual Reasoning) - 专知论文

会员服务 ·

0

变换 · BASIC · 推断 · Performer · state-of-the-art ·

2020 年 11 月 26 日

Transformation Driven Visual Reasoning

翻译：变换驱动视觉理由

Xin Hong,Yanyan Lan,Liang Pang,Jiafeng Guo,Xueqi Cheng

This paper defines a new visual reasoning paradigm by introducing an important factor, i.e., transformation. The motivation comes from the fact that most existing visual reasoning tasks, such as CLEVR in VQA, are solely defined to test how well the machine understands the concepts and relations within static settings, like one image. We argue that this kind of state driven visual reasoning approach has limitations in reflecting whether the machine has the ability to infer the dynamics between different states, which has been shown as important as state-level reasoning for human cognition in Piaget's theory. To tackle this problem, we propose a novel transformation driven visual reasoning task. Given both the initial and final states, the target is to infer the corresponding single-step or multi-step transformation, represented as a triplet (object, attribute, value) or a sequence of triplets, respectively. Following this definition, a new dataset namely TRANCE is constructed on the basis of CLEVR, including three levels of settings, i.e., Basic (single-step transformation), Event (multi-step transformation), and View (multi-step transformation with variant views). Experimental results show that the state-of-the-art visual reasoning models perform well on Basic, but are still far from human-level intelligence on Event and View. We believe the proposed new paradigm will boost the development of machine visual reasoning. More advanced methods and real data need to be investigated in this direction. Code is available at: https://github.com/hughplay/TVR.

翻译：本文定义了新的视觉推理范式, 引入了一个重要因素, 即变换。动因来自以下事实: 大部分现有的视觉推理任务, 如 VQA 中的 CLEVR, 仅被定义为测试机器在静态设置中( 如一个图像) 如何理解概念和关系。我们争辩说, 这种国家驱动的视觉推理方法在反映机器是否有能力推断不同国家之间的动态方面有局限性, 这与Piaget 理论中的州级人类认知推理推理一样重要。为了解决这个问题, 我们提议了一个新的由变化驱动的视觉推理任务。鉴于最初和最后的状态, 目标是推断相应的单步或多步转换, 分别代表三步制( 对象、属性、价值) 或者三步制。根据这一定义, 一个新的数据集, 即TRNCEZ, 是在CLEVR 的基础上构建的, 包括三个层次的设置, 即基本( 步骤转换) 、事件( 多步式变换) 和观察( 多步式) 视觉推理: 在我们现有的视觉推理学中, 更能显示我们现有的视觉推理学。。实验性推理将显示我们现有的推理。

3

相关内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

专知会员服务

51+阅读 · 2020年8月25日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

【华盛顿大学】知识建模+生成式推理，60页ppt，Cracking Commonsense Intelligence with Knowledge Modeling + Generative Reasoning

【华盛顿大学】知识建模+生成式推理，60页ppt，Cracking Commonsense Intelligence with Knowledge Modeling + Generative Reasoning

专知会员服务

54+阅读 · 2019年12月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

产业智能官

6+阅读 · 2017年9月1日

神经网络也可以有逻辑——解析视觉推理（Visual Reasoning）

神经网络也可以有逻辑——解析视觉推理（Visual Reasoning）

人工智能头条

3+阅读 · 2017年8月25日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Visual Question Reasoning on General Dependency Tree

Arxiv

6+阅读 · 2018年3月31日

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Arxiv

5+阅读 · 2018年3月23日

A dataset and architecture for visual reasoning with a working memory

Arxiv

3+阅读 · 2018年3月16日

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

Arxiv

6+阅读 · 2018年2月12日

Object-based reasoning in VQA

Arxiv

6+阅读 · 2018年1月29日

Natural Language Guided Visual Relationship Detection

Arxiv

3+阅读 · 2017年11月21日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

专知会员服务

51+阅读 · 2020年8月25日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

【华盛顿大学】知识建模+生成式推理，60页ppt，Cracking Commonsense Intelligence with Knowledge Modeling + Generative Reasoning

【华盛顿大学】知识建模+生成式推理，60页ppt，Cracking Commonsense Intelligence with Knowledge Modeling + Generative Reasoning

专知会员服务

54+阅读 · 2019年12月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

产业智能官

6+阅读 · 2017年9月1日

神经网络也可以有逻辑——解析视觉推理（Visual Reasoning）

神经网络也可以有逻辑——解析视觉推理（Visual Reasoning）

人工智能头条

3+阅读 · 2017年8月25日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

相关论文

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Visual Question Reasoning on General Dependency Tree

Arxiv

6+阅读 · 2018年3月31日

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Arxiv

5+阅读 · 2018年3月23日

A dataset and architecture for visual reasoning with a working memory

Arxiv

3+阅读 · 2018年3月16日

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

Arxiv

6+阅读 · 2018年2月12日

Object-based reasoning in VQA

Arxiv

6+阅读 · 2018年1月29日

Natural Language Guided Visual Relationship Detection

Arxiv

3+阅读 · 2017年11月21日

微信扫码咨询专知VIP会员