访问更多图像描述时间 (Attend More Times for Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · ONCE · INFORMS · 注意力机制 · MoDELS ·

2018 年 12 月 8 日

Attend More Times for Image Captioning

翻译：访问更多图像描述时间

Jiajun Du,Yu Qin,Hongtao Lu,Yonghua Zhang

Most attention-based image captioning models attend to the image once per word. However, attending once per word is rigid and is easy to miss some information. Attending more times can adjust the attention position, find the missing information back and avoid generating the wrong word. In this paper, we show that attending more times per word can gain improvements in the image captioning task. We propose a flexible two-LSTM merge model to make it convenient to encode more attentions than words. Our captioning model uses two LSTMs to encode the word sequence and the attention sequence respectively. The information of the two LSTMs and the image feature are combined to predict the next word. Experiments on the MSCOCO caption dataset show that our method outperforms the state-of-the-art. Using bottom up features and self-critical training method, our method gets BLEU-4, METEOR, ROUGE-L and CIDEr scores of 0.381, 0.283, 0.580 and 1.261 on the Karpathy test split.

翻译：多数关注的图像字幕模型对图像每个单词都关注一次。但是, 一次访问每个单词是僵硬的, 容易丢失某些信息。更多时间的处理可以调整关注位置, 找到缺失的信息, 避免生成错误的单词。在本文中, 我们显示每个单词多关注一次可以改进图像字幕任务。我们提议一个灵活的两个LSTM 合并模型, 以便于将更多的注意力比单词进行编码。我们的字幕模型使用两个LSTM 来分别对单词序列和注意序列进行编码。两个 LSTM 和图像特征的信息可以合并来预测下一个单词。对 MSCO 字幕数据集的实验显示, 我们的方法比最新工艺要好。使用自下而上的特点和自我批评的培训方法, 我们的方法在 Karpathic 测试分法上得到了 BLEU-4、 METEOR、 ROUGEL 和 CIDER 分数为0. 381、 0. 283、 0. 0. 580 和 1. 261 。

6

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Capsule Networks，胶囊网络，57页ppt，布法罗大学

Capsule Networks，胶囊网络，57页ppt，布法罗大学

专知会员服务

69+阅读 · 2020年2月29日

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

专知会员服务

57+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

NIPS 2017论文解读 | 基于对比学习的Image Captioning

NIPS 2017论文解读 | 基于对比学习的Image Captioning

PaperWeekly

6+阅读 · 2018年2月28日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文 | CVPR2017有哪些值得读的Image Caption论文？

论文 | CVPR2017有哪些值得读的Image Caption论文？

黑龙江大学自然语言处理实验室

16+阅读 · 2017年12月1日

Simple Recurrent Unit For Sentence Classification

Simple Recurrent Unit For Sentence Classification

哈工大SCIR

6+阅读 · 2017年11月29日

CVPR2017有哪些值得读的Image Caption论文？

CVPR2017有哪些值得读的Image Caption论文？

PaperWeekly

10+阅读 · 2017年11月29日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Improving Image Captioning by Leveraging Knowledge Graphs

Arxiv

8+阅读 · 2019年1月25日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

Recurrent Fusion Network for Image Captioning

Recurrent Fusion Network for Image Captioning

Arxiv

3+阅读 · 2018年7月31日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Capsule Networks，胶囊网络，57页ppt，布法罗大学

Capsule Networks，胶囊网络，57页ppt，布法罗大学

专知会员服务

69+阅读 · 2020年2月29日

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

专知会员服务

57+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

三维高斯泼溅应用综述：分割、编辑与生成

《多智能体不确定环境追逃博弈研究》216页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

相关资讯

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

NIPS 2017论文解读 | 基于对比学习的Image Captioning

NIPS 2017论文解读 | 基于对比学习的Image Captioning

PaperWeekly

6+阅读 · 2018年2月28日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文 | CVPR2017有哪些值得读的Image Caption论文？

论文 | CVPR2017有哪些值得读的Image Caption论文？

黑龙江大学自然语言处理实验室

16+阅读 · 2017年12月1日

Simple Recurrent Unit For Sentence Classification

Simple Recurrent Unit For Sentence Classification

哈工大SCIR

6+阅读 · 2017年11月29日

CVPR2017有哪些值得读的Image Caption论文？

CVPR2017有哪些值得读的Image Caption论文？

PaperWeekly

10+阅读 · 2017年11月29日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Improving Image Captioning by Leveraging Knowledge Graphs

Arxiv

8+阅读 · 2019年1月25日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

Recurrent Fusion Network for Image Captioning

Recurrent Fusion Network for Image Captioning

Arxiv

3+阅读 · 2018年7月31日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

微信扫码咨询专知VIP会员