显示、显示、告诉和分解: 以部分标签数据通过自我检索法进行图像描述 (Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data) - 专知论文

会员服务 ·

0

图像字幕 · 判别器 · Guidance · Performer · AIM ·

2018 年 3 月 22 日

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

翻译：显示、显示、告诉和分解: 以部分标签数据通过自我检索法进行图像描述

Xihui Liu,Hongsheng Li,Jing Shao,Dapeng Chen,Xiaogang Wang

from arxiv, Submission to ECCV 2018

The aim of image captioning is to generate similar captions by machine as human do to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. It brings unique advantages: (1) the self-retrieval guidance can act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions. (2) The correspondence between generated captions and images are naturally incorporated in the generation process without human annotations, and hence our approach could utilize a large amount of unlabeled images to boost captioning performance with no additional laborious annotations. We demonstrate the effectiveness of the proposed retrieval-guided method on MS-COCO and Flickr30k captioning datasets, and show its superior captioning performance with more discriminative captions.

翻译：图像字幕的目的是通过机器产生与人类相似的字幕来描述图像内容。尽管做了许多努力,但为图像生成的歧视性字幕仍然是非三重性的。大多数传统方法模仿语言结构模式,因此往往形成复制频繁的短语或句子的定型观念,忽视每个图像的独特方面。在这项工作中,我们提出了一个图像字幕框架,以自我检索模块作为培训指南,鼓励产生歧视性字幕。它带来了独特的优势:(1)自检索指南可以作为说明歧视的衡量标准和评价者,确保生成的字幕的质量。 (2) 生成的字幕和图像之间的对应自然融入生成过程,没有人类的注释,因此我们的方法可以使用大量无标签的图像来提高字幕的性能,而没有额外的艰苦说明。我们展示了在MS-COCO和Flick30k描述数据集上拟议的检索指导方法的有效性,并以更具有歧视性的字幕展示其优异的字幕性能。

8

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

Combination of Multiple Global Descriptors for Image Retrieval

Combination of Multiple Global Descriptors for Image Retrieval

Arxiv

3+阅读 · 2019年4月18日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Fine-tuning CNN Image Retrieval with No Human Annotation

Fine-tuning CNN Image Retrieval with No Human Annotation

Arxiv

4+阅读 · 2018年7月10日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Discriminability objective for training descriptive captions

Arxiv

9+阅读 · 2018年3月12日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

VIP会员

文章信息

相关主题

相关VIP内容

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

Combination of Multiple Global Descriptors for Image Retrieval

Combination of Multiple Global Descriptors for Image Retrieval

Arxiv

3+阅读 · 2019年4月18日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Fine-tuning CNN Image Retrieval with No Human Annotation

Fine-tuning CNN Image Retrieval with No Human Annotation

Arxiv

4+阅读 · 2018年7月10日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Discriminability objective for training descriptive captions

Arxiv

9+阅读 · 2018年3月12日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

微信扫码咨询专知VIP会员