检索列表图像说明 (Retrieval-augmented Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · 数据集 · MoDELS · 多峰值 · Processing（编程语言） ·

2023 年 2 月 16 日

Retrieval-augmented Image Captioning

翻译：检索列表图像说明

Rita Ramos,Desmond Elliott,Bruno Martins

Inspired by retrieval-augmented language generation and pretrained Vision and Language (V&L) encoders, we present a new approach to image captioning that generates sentences given the input image and a set of captions retrieved from a datastore, as opposed to the image alone. The encoder in our model jointly processes the image and retrieved captions using a pretrained V&L BERT, while the decoder attends to the multimodal encoder representations, benefiting from the extra textual evidence from the retrieved captions. Experimental results on the COCO dataset show that image captioning can be effectively formulated from this new perspective. Our model, named EXTRA, benefits from using captions retrieved from the training dataset, and it can also benefit from using an external dataset without the need for retraining. Ablation studies show that retrieving a sufficient number of captions (e.g., k=5) can improve captioning quality. Our work contributes towards using pretrained V&L encoders for generative tasks, instead of standard classification tasks.

翻译：在检索强化的语言生成和预先培训的视觉和语言(V & L)编码器的启发下,我们展示了一种新的图像字幕方法,根据输入图像和从数据存储处检索的一组字幕生成句子,而不是仅仅图像。模型中的编码器使用预先培训的 V & L BERT 联合处理图像和检索字幕,而解码器则利用从检索的字幕中获取的额外文字证据进入多式编码器的演示。COCO数据集的实验结果表明,图像字幕可以从这一新的角度有效制作。我们的模型,名为 EXTRA, 使用从培训数据集检索的字幕的好处, 也可以在不需要再培训的情况下使用外部数据集。调整研究表明, 重新获取足够数量的字幕( e.g., k=5) 能够提高字幕质量。我们的工作有助于使用事先培训的 V & L 编码器进行基因化任务, 而不是标准分类任务。

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

一类稳态Schödinger-Poisson-Slater方程标准化解的研究

国家自然科学基金

1+阅读 · 2015年12月31日

ACT结构域的调控机制及潜在农药靶标研究

国家自然科学基金

0+阅读 · 2014年12月31日

调控马铃薯干旱胁迫响应相关转录因子的miRNA功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

一类新颖结构的链霉菌源Vicenistations类抗肿瘤成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNAs-22经PTEN/Akt信号通路对心肌肥厚的调控

国家自然科学基金

0+阅读 · 2011年12月31日

新基因CIAPIN1相互作用分子及其信号转导网络的研究

国家自然科学基金

0+阅读 · 2009年12月31日

编织C/SiC复合材料的各向异性损伤演化及本构关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

NiMnInCo合金薄膜的磁驱动马氏体相变及磁感生应变研究

国家自然科学基金

0+阅读 · 2009年12月31日

利用GPS与IM/WS干涉测量监测鲜水河断层变形

国家自然科学基金

0+阅读 · 2008年12月31日

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年4月5日

Cross-Domain Image Captioning with Discriminative Finetuning

Arxiv

0+阅读 · 2023年4月4日

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Arxiv

0+阅读 · 2023年4月3日

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Arxiv

0+阅读 · 2023年4月3日

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

Arxiv

0+阅读 · 2023年4月3日

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年4月5日

Cross-Domain Image Captioning with Discriminative Finetuning

Arxiv

0+阅读 · 2023年4月4日

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Arxiv

0+阅读 · 2023年4月3日

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Arxiv

0+阅读 · 2023年4月3日

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

Arxiv

0+阅读 · 2023年4月3日

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

相关基金

一类稳态Schödinger-Poisson-Slater方程标准化解的研究

国家自然科学基金

1+阅读 · 2015年12月31日

ACT结构域的调控机制及潜在农药靶标研究

国家自然科学基金

0+阅读 · 2014年12月31日

调控马铃薯干旱胁迫响应相关转录因子的miRNA功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

一类新颖结构的链霉菌源Vicenistations类抗肿瘤成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNAs-22经PTEN/Akt信号通路对心肌肥厚的调控

国家自然科学基金

0+阅读 · 2011年12月31日

新基因CIAPIN1相互作用分子及其信号转导网络的研究

国家自然科学基金

0+阅读 · 2009年12月31日

编织C/SiC复合材料的各向异性损伤演化及本构关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

NiMnInCo合金薄膜的磁驱动马氏体相变及磁感生应变研究

国家自然科学基金

0+阅读 · 2009年12月31日

利用GPS与IM/WS干涉测量监测鲜水河断层变形

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员