显示、显示、告诉和分解: 以部分标签数据通过自我检索法进行图像描述 (Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data) - 专知论文

会员服务 ·

0

图像字幕 · 判别器 · Guidance · Performer · AIM ·

2018 年 7 月 23 日

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

翻译：显示、显示、告诉和分解: 以部分标签数据通过自我检索法进行图像描述

Xihui Liu,Hongsheng Li,Jing Shao,Dapeng Chen,Xiaogang Wang

from arxiv, Accepted by ECCV 2018

The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. It brings unique advantages: (1) the self-retrieval guidance can act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions. (2) The correspondence between generated captions and images are naturally incorporated in the generation process without human annotations, and hence our approach could utilize a large amount of unlabeled images to boost captioning performance with no additional laborious annotations. We demonstrate the effectiveness of the proposed retrieval-guided method on COCO and Flickr30k captioning datasets, and show its superior captioning performance with more discriminative captions.

翻译：图像字幕的目的是通过机器生成字幕来描述图像内容。尽管做出了许多努力,但为图像生成歧视性字幕仍然是非三重性的。大多数传统方法模仿语言结构模式,因此往往形成复制常用短语或句子的陈规定型,忽视每个图像的独特性。在这项工作中,我们提出一个带有自我检索模块的图像字幕框架,作为培训指南,鼓励生成歧视性字幕。它带来了独特的优势:(1)自搜索指南可以作为描述歧视性字幕的量度和评估者,以确保生成的字幕的质量。 (2)生成的字幕和图像之间的对应自然融入生成过程,而没有人类的注释,因此,我们的方法可以使用大量无标签的图像,在没有额外劳动说明的情况下提高字幕的性能。我们展示了拟议的COCO和Flick30k字幕数据集检索制导法的有效性,并以更具歧视性的字幕显示其优异性字幕的性能。

4

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

《人工智能数据安全白皮书》（2019版）发布，51页PDF，中国信息通信研究院编

《人工智能数据安全白皮书》（2019版）发布，51页PDF，中国信息通信研究院编

专知会员服务

148+阅读 · 2019年11月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

Active Generative Adversarial Network for Image Classification

Arxiv

4+阅读 · 2019年6月17日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Arxiv

4+阅读 · 2018年11月26日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Arxiv

4+阅读 · 2018年7月29日

Fine-tuning CNN Image Retrieval with No Human Annotation

Fine-tuning CNN Image Retrieval with No Human Annotation

Arxiv

4+阅读 · 2018年7月10日

Improving Image Captioning with Conditional Generative Adversarial Nets

Arxiv

9+阅读 · 2018年5月18日

Improved Image Captioning with Adversarial Semantic Alignment

Arxiv

6+阅读 · 2018年4月30日

Discriminability objective for training descriptive captions

Arxiv

9+阅读 · 2018年3月12日

VIP会员

文章信息

相关主题

相关VIP内容

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

《人工智能数据安全白皮书》（2019版）发布，51页PDF，中国信息通信研究院编

《人工智能数据安全白皮书》（2019版）发布，51页PDF，中国信息通信研究院编

专知会员服务

148+阅读 · 2019年11月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

Active Generative Adversarial Network for Image Classification

Arxiv

4+阅读 · 2019年6月17日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Arxiv

4+阅读 · 2018年11月26日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Arxiv

4+阅读 · 2018年7月29日

Fine-tuning CNN Image Retrieval with No Human Annotation

Fine-tuning CNN Image Retrieval with No Human Annotation

Arxiv

4+阅读 · 2018年7月10日

Improving Image Captioning with Conditional Generative Adversarial Nets

Arxiv

9+阅读 · 2018年5月18日

Improved Image Captioning with Adversarial Semantic Alignment

Arxiv

6+阅读 · 2018年4月30日

Discriminability objective for training descriptive captions

Arxiv

9+阅读 · 2018年3月12日

微信扫码咨询专知VIP会员