CPTR: 图像说明全变换器网络 (CPTR: Full Transformer Network for Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · 全 · 变换 · Extensibility · MoDELS ·

2021 年 1 月 27 日

CPTR: Full Transformer Network for Image Captioning

翻译：CPTR: 图像说明全变换器网络

Wei Liu,Sihan Chen,Longteng Guo,Xinxin Zhu,Jing Liu

In this paper, we consider the image captioning task from a new sequence-to-sequence prediction perspective and propose CaPtion TransformeR (CPTR) which takes the sequentialized raw images as the input to Transformer. Compared to the "CNN+Transformer" design paradigm, our model can model global context at every encoder layer from the beginning and is totally convolution-free. Extensive experiments demonstrate the effectiveness of the proposed model and we surpass the conventional "CNN+Transformer" methods on the MSCOCO dataset. Besides, we provide detailed visualizations of the self-attention between patches in the encoder and the "words-to-patches" attention in the decoder thanks to the full Transformer architecture.

翻译：在本文中,我们从新的序列到序列的预测角度来考虑图像说明任务,并提议CaPtion 变换R(CPTR), 将序列原始图像作为输入变换器的输入。与“ CNN+ Transforn't” 设计范式相比, 我们的模型可以从一开始就在每一个编码层建模全球背景, 并且完全没有进化。广泛的实验显示了拟议模型的有效性, 并且我们超越了在 MCCO 数据集中的常规的“ CNN+ Transformex” 方法。此外, 我们提供了详细图像化了编码器中的补丁和解码器中的“ 字对字” 之间的自控关注, 这是由于完整的变换结构。

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

【目标检测最新论文】Matrix Nets：用于目标检测的新型深度架构

【目标检测最新论文】Matrix Nets：用于目标检测的新型深度架构

专知

9+阅读 · 2019年8月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

CornerNet: Detecting Objects as Paired Keypoints 论文笔记

CornerNet: Detecting Objects as Paired Keypoints 论文笔记

统计学习与视觉计算组

7+阅读 · 2018年9月27日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

NIPS 2017论文解读 | 基于对比学习的Image Captioning

NIPS 2017论文解读 | 基于对比学习的Image Captioning

PaperWeekly

6+阅读 · 2018年2月28日

CVPR2017有哪些值得读的Image Caption论文？

CVPR2017有哪些值得读的Image Caption论文？

PaperWeekly

10+阅读 · 2017年11月29日

ICCV17 :12为顶级大牛教你学生成对抗网络（GAN)！

ICCV17 :12为顶级大牛教你学生成对抗网络（GAN)！

全球人工智能

8+阅读 · 2017年11月26日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Image Captioning: Transforming Objects into Words

Image Captioning: Transforming Objects into Words

Arxiv

7+阅读 · 2019年6月14日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Recurrent Fusion Network for Image Captioning

Recurrent Fusion Network for Image Captioning

Arxiv

3+阅读 · 2018年7月31日

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Arxiv

4+阅读 · 2018年7月29日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Improving Image Captioning with Conditional Generative Adversarial Nets

Arxiv

9+阅读 · 2018年5月18日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

【目标检测最新论文】Matrix Nets：用于目标检测的新型深度架构

【目标检测最新论文】Matrix Nets：用于目标检测的新型深度架构

专知

9+阅读 · 2019年8月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

CornerNet: Detecting Objects as Paired Keypoints 论文笔记

CornerNet: Detecting Objects as Paired Keypoints 论文笔记

统计学习与视觉计算组

7+阅读 · 2018年9月27日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

NIPS 2017论文解读 | 基于对比学习的Image Captioning

NIPS 2017论文解读 | 基于对比学习的Image Captioning

PaperWeekly

6+阅读 · 2018年2月28日

CVPR2017有哪些值得读的Image Caption论文？

CVPR2017有哪些值得读的Image Caption论文？

PaperWeekly

10+阅读 · 2017年11月29日

ICCV17 :12为顶级大牛教你学生成对抗网络（GAN)！

ICCV17 :12为顶级大牛教你学生成对抗网络（GAN)！

全球人工智能

8+阅读 · 2017年11月26日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Image Captioning: Transforming Objects into Words

Image Captioning: Transforming Objects into Words

Arxiv

7+阅读 · 2019年6月14日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Recurrent Fusion Network for Image Captioning

Recurrent Fusion Network for Image Captioning

Arxiv

3+阅读 · 2018年7月31日

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Arxiv

4+阅读 · 2018年7月29日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Improving Image Captioning with Conditional Generative Adversarial Nets

Arxiv

9+阅读 · 2018年5月18日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员