图像是否值得五个句子? (Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching) - 专知论文

会员服务 ·

0

图像字幕 · binary · 边缘化 · 数据集 · INFORMS ·

2021 年 10 月 6 日

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

翻译：图像是否值得五个句子?

Ali Furkan Biten,Andres Mafla,Lluis Gomez,Dimosthenis Karatzas

from arxiv, Accepted WACV 2022

The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a \emph{large} improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. Code with our metrics and adaptive margin formulation will be made public.

翻译：图像- 文本匹配任务旨在将不同模式的表达方式映射成共同的视觉- 文字嵌入式。然而, 最广泛用于此任务的数据集( MSCOCO 和 Flickr30K ) 实际上是图像字幕数据集, 在图像和判决的地面真相说明中提供了非常有限的一组关系。这种有限的地面真相信息迫使我们使用基于二元相关性的评价指标: 给一个句子查询, 我们认为只有一个图像是相关的。但是, 数据集中可能存在许多其他相关的图像或说明。在这项工作中, 我们提出了两个衡量标准, 评估检索到的物品的语义相关性程度, 独立于其附加说明的二元相关性。此外, 我们纳入了一个新的战略, 使用图像说明性说明性指标( CIDer) 来定义一个基于标准的三重损失的语义调整边距值( SAM ) 。通过将我们的配方与现有模型相结合, 在现有的培训数据有限的情况下, 也取得了一项改进。我们还表明, 附加说明性图像描述性图像组合的性能保持完整调整性, 同时将使用其它的校准度设定非标准。

1

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

彭博新书《知识图谱: 一种信息检索视角》，159页pdf

彭博新书《知识图谱: 一种信息检索视角》，159页pdf

专知会员服务

152+阅读 · 2020年11月1日

【KDD 2020】基于互信息最大化的多知识图谱语义融合

【KDD 2020】基于互信息最大化的多知识图谱语义融合

专知会员服务

43+阅读 · 2020年9月7日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

用 CNN 分 100,000 类图像

用 CNN 分 100,000 类图像

AI研习社

5+阅读 · 2018年3月20日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On the Effectiveness of Neural Ensembles for Image Classification with Small Datasets

Arxiv

0+阅读 · 2021年11月29日

Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Arxiv

1+阅读 · 2021年11月29日

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

Arxiv

21+阅读 · 2020年12月29日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Entity-aware Image Caption Generation

Arxiv

4+阅读 · 2018年11月7日

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Arxiv

4+阅读 · 2018年7月12日

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Arxiv

5+阅读 · 2018年5月18日

Improved Image Captioning with Adversarial Semantic Alignment

Arxiv

6+阅读 · 2018年4月30日

VIP会员

文章信息

相关主题

相关VIP内容

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

彭博新书《知识图谱: 一种信息检索视角》，159页pdf

彭博新书《知识图谱: 一种信息检索视角》，159页pdf

专知会员服务

152+阅读 · 2020年11月1日

【KDD 2020】基于互信息最大化的多知识图谱语义融合

【KDD 2020】基于互信息最大化的多知识图谱语义融合

专知会员服务

43+阅读 · 2020年9月7日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型幻觉：系统综述

《分析与预测陆军战斗体能测试表现：统计与机器学习方法》2025最新137页

【博士论文】数据与任务的物理学：深度学习中的局部性与组合性理论

代理式人工智能时代的决策优势

相关资讯

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

用 CNN 分 100,000 类图像

用 CNN 分 100,000 类图像

AI研习社

5+阅读 · 2018年3月20日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On the Effectiveness of Neural Ensembles for Image Classification with Small Datasets

Arxiv

0+阅读 · 2021年11月29日

Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Arxiv

1+阅读 · 2021年11月29日

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

Arxiv

21+阅读 · 2020年12月29日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Entity-aware Image Caption Generation

Arxiv

4+阅读 · 2018年11月7日

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Arxiv

4+阅读 · 2018年7月12日

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Arxiv

5+阅读 · 2018年5月18日

Improved Image Captioning with Adversarial Semantic Alignment

Arxiv

6+阅读 · 2018年4月30日

微信扫码咨询专知VIP会员