探索用于未偏移图像描述的语义关系 (Exploring Semantic Relationships for Unpaired Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · Pair · Extensibility · Better · MoDELS ·

2021 年 8 月 17 日

Exploring Semantic Relationships for Unpaired Image Captioning

翻译：探索用于未偏移图像描述的语义关系

Fenglin Liu,Meng Gao,Tianhao Zhang,Yuexian Zou

Recently, image captioning has aroused great interest in both academic and industrial worlds. Most existing systems are built upon large-scale datasets consisting of image-sentence pairs, which, however, are time-consuming to construct. In addition, even for the most advanced image captioning systems, it is still difficult to realize deep image understanding. In this work, we achieve unpaired image captioning by bridging the vision and the language domains with high-level semantic information. The motivation stems from the fact that the semantic concepts with the same modality can be extracted from both images and descriptions. To further improve the quality of captions generated by the model, we propose the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image. Extensive experiments on MSCOCO dataset show that we can generate desirable captions without paired datasets. Furthermore, the proposed approach boosts five strong baselines under the paired setting, where the most significant improvement in CIDEr score reaches 8%, demonstrating that it is effective and generalizes well to a wide range of models.

翻译：最近,图像字幕在学术和工业界引起了极大兴趣。大多数现有系统都建立在大型数据集之上,这些数据集由图像-感应对配对组成,但是,这些配对需要花费时间。此外,即使是最先进的图像字幕系统,仍然难以实现深刻的图像理解。在这项工作中,我们通过将视觉和语言领域与高层次语义信息连接起来,实现了无偏颇的图像字幕。其动机来自一个事实,即具有相同模式的语义概念可以从图像和描述中提取出来。为了进一步提高模型生成的字幕的质量,我们建议使用语义关系探索器,探索语义概念之间的关系,以便更好地了解图像。关于MCCO数据集的广泛实验表明,我们可以在没有配对数据集的情况下生成理想的字幕。此外,拟议方法在配对环境中提升了五个强的基线,其中CIDer最显著的改进达到8%,表明它有效,并广泛概括到各种模型。

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

如何做好一场报告？斯坦福Kayvon教授《清晰报告指南》为您讲解，附69页ppt

如何做好一场报告？斯坦福Kayvon教授《清晰报告指南》为您讲解，附69页ppt

专知会员服务

51+阅读 · 2021年5月27日

知识驱动的视觉知识学习，以VQA视觉问答为例，31页ppt

知识驱动的视觉知识学习，以VQA视觉问答为例，31页ppt

专知会员服务

36+阅读 · 2020年9月25日

【Google】多模态Transformer视频检索，Multi-modal Transformer

【Google】多模态Transformer视频检索，Multi-modal Transformer

专知会员服务

103+阅读 · 2020年7月22日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新七篇图像分类相关论文—条件标签空间、生成对抗胶囊网络、深度预测编码网络、生成对抗网络、数字病理图像、在线表示学习

【论文推荐】最新七篇图像分类相关论文—条件标签空间、生成对抗胶囊网络、深度预测编码网络、生成对抗网络、数字病理图像、在线表示学习

专知

17+阅读 · 2018年3月3日

NIPS 2017论文解读 | 基于对比学习的Image Captioning

NIPS 2017论文解读 | 基于对比学习的Image Captioning

PaperWeekly

6+阅读 · 2018年2月28日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Dense Relational Image Captioning via Multi-task Triple-Stream Networks

Arxiv

0+阅读 · 2021年10月11日

Semantic Grouping Network for Video Captioning

Semantic Grouping Network for Video Captioning

Arxiv

3+阅读 · 2021年2月3日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Arxiv

4+阅读 · 2019年9月22日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Unpaired Image Captioning by Language Pivoting

Arxiv

4+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

相关VIP内容

如何做好一场报告？斯坦福Kayvon教授《清晰报告指南》为您讲解，附69页ppt

如何做好一场报告？斯坦福Kayvon教授《清晰报告指南》为您讲解，附69页ppt

专知会员服务

51+阅读 · 2021年5月27日

知识驱动的视觉知识学习，以VQA视觉问答为例，31页ppt

知识驱动的视觉知识学习，以VQA视觉问答为例，31页ppt

专知会员服务

36+阅读 · 2020年9月25日

【Google】多模态Transformer视频检索，Multi-modal Transformer

【Google】多模态Transformer视频检索，Multi-modal Transformer

专知会员服务

103+阅读 · 2020年7月22日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新七篇图像分类相关论文—条件标签空间、生成对抗胶囊网络、深度预测编码网络、生成对抗网络、数字病理图像、在线表示学习

【论文推荐】最新七篇图像分类相关论文—条件标签空间、生成对抗胶囊网络、深度预测编码网络、生成对抗网络、数字病理图像、在线表示学习

专知

17+阅读 · 2018年3月3日

NIPS 2017论文解读 | 基于对比学习的Image Captioning

NIPS 2017论文解读 | 基于对比学习的Image Captioning

PaperWeekly

6+阅读 · 2018年2月28日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Dense Relational Image Captioning via Multi-task Triple-Stream Networks

Arxiv

0+阅读 · 2021年10月11日

Semantic Grouping Network for Video Captioning

Semantic Grouping Network for Video Captioning

Arxiv

3+阅读 · 2021年2月3日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Arxiv

4+阅读 · 2019年9月22日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Unpaired Image Captioning by Language Pivoting

Arxiv

4+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员