图表关系变换器:将双向对象特性纳入变换器结构 (Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture) - 专知论文

会员服务 ·

0

INFORMS · 视觉问答 · 成对型 · 图 · 变换 ·

2021 年 11 月 11 日

Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture

翻译：图表关系变换器:将双向对象特性纳入变换器结构

Michael Yang,Aditya Anantharaman,Zachary Kitowski,Derik Clive Robert

from arxiv, Presented as poster in CVPR 2021 Visual Question Answering Workshop

Previous studies such as VizWiz find that Visual Question Answering (VQA) systems that can read and reason about text in images are useful in application areas such as assisting visually-impaired people. TextVQA is a VQA dataset geared towards this problem, where the questions require answering systems to read and reason about visual objects and text objects in images. One key challenge in TextVQA is the design of a system that effectively reasons not only about visual and text objects individually, but also about the spatial relationships between these objects. This motivates the use of 'edge features', that is, information about the relationship between each pair of objects. Some current TextVQA models address this problem but either only use categories of relations (rather than edge feature vectors) or do not use edge features within the Transformer architectures. In order to overcome these shortcomings, we propose a Graph Relation Transformer (GRT), which uses edge information in addition to node information for graph attention computation in the Transformer. We find that, without using any other optimizations, the proposed GRT method outperforms the accuracy of the M4C baseline model by 0.65% on the val set and 0.57% on the test set. Qualitatively, we observe that the GRT has superior spatial reasoning ability to M4C.

翻译：VizWiz 等先前的研究发现,视觉问答系统(VizWiz)能够读懂和理解图像中文本的系统在帮助视障人士等应用领域有用。 TextVQA 是针对这一问题的 VQA 数据集, 这些问题要求回答系统读懂和理解图像中的视觉对象和文本对象。 TextVQA 中的一个关键挑战是设计一个系统,该系统不仅能有效地解释视觉和文本对象,而且能有效解释这些对象之间的空间关系。这促使使用“ 尖端功能”, 即关于每对对象之间关系的信息。目前的一些 TextVQA 模型处理这一问题,但可能只使用关系类别( 而不是边缘特性矢量), 或者不使用变异结构结构中的边缘特征。为了克服这些缺陷,我们建议了一个图表关系变异变器(GRT), 它除了使用节点信息来计算变换器中的图像关注度。我们发现, 在不使用其他任何优化的情况下, 提议的GRT RT 方法在每对对象之间的关系中, 某些 Text 方法将超越了 m4C 的精确度, 将M4C 标定为0.15 和我们测量了 mevill 的 meval 的 meval 的 mainal 。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【ICML2021】PoolingFormer：具有池化注意力机制的长序列输入模型

专知会员服务

35+阅读 · 2021年7月25日

【AAAI2021】基于组间语义挖掘的弱监督语义分割

【AAAI2021】基于组间语义挖掘的弱监督语义分割

专知会员服务

16+阅读 · 2021年1月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Relation Networks for Object Detection 论文笔记

Relation Networks for Object Detection 论文笔记

统计学习与视觉计算组

16+阅读 · 2018年4月18日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Contrastive Neural Architecture Search with Neural Architecture Comparators

Arxiv

4+阅读 · 2021年4月6日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning

Arxiv

5+阅读 · 2019年9月5日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

3+阅读 · 2019年5月18日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Arxiv

9+阅读 · 2018年3月13日

Good Features to Correlate for Visual Tracking

Arxiv

10+阅读 · 2018年3月10日

Finding ReMO (Related Memory Object): A Simple Neural Architecture for Text based Reasoning

Arxiv

4+阅读 · 2018年1月26日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【ICML2021】PoolingFormer：具有池化注意力机制的长序列输入模型

专知会员服务

35+阅读 · 2021年7月25日

【AAAI2021】基于组间语义挖掘的弱监督语义分割

【AAAI2021】基于组间语义挖掘的弱监督语义分割

专知会员服务

16+阅读 · 2021年1月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Relation Networks for Object Detection 论文笔记

Relation Networks for Object Detection 论文笔记

统计学习与视觉计算组

16+阅读 · 2018年4月18日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Contrastive Neural Architecture Search with Neural Architecture Comparators

Arxiv

4+阅读 · 2021年4月6日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning

Arxiv

5+阅读 · 2019年9月5日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

3+阅读 · 2019年5月18日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Arxiv

9+阅读 · 2018年3月13日

Good Features to Correlate for Visual Tracking

Arxiv

10+阅读 · 2018年3月10日

Finding ReMO (Related Memory Object): A Simple Neural Architecture for Text based Reasoning

Arxiv

4+阅读 · 2018年1月26日

微信扫码咨询专知VIP会员