理解整个域域的向导图像说明性效绩 (Understanding Guided Image Captioning Performance across Domains) - 专知论文

会员服务 ·

0

图像字幕 · Performer · MoDELS · 可理解性 · Readability ·

2020 年 12 月 4 日

Understanding Guided Image Captioning Performance across Domains

翻译：理解整个域域的向导图像说明性效绩

Edwin G. Ng,Bo Pang,Piyush Sharma,Radu Soricut

from arxiv, 13 pages, 7 figures

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models generally lack the ability to provide long descriptive answers, while expecting the textual question to be quite precise. We present a method to control the concepts that an image caption should focus on, using an additional input called the guiding text that refers to either groundable or ungroundable concepts in the image. Our model consists of a Transformer-based multimodal encoder that uses the guiding text together with global and object-level image features to derive early-fusion representations used to generate the guided caption. While models trained on Visual Genome data have an in-domain advantage of fitting well when guided with automatic object labels, we find that guided captioning models trained on Conceptual Captions generalize better on out-of-domain images and guiding texts. Our human-evaluation results indicate that attempting in-the-wild guided image captioning requires access to large, unrestricted-domain training datasets, and that increased style diversity (even without increasing vocabulary size) is a key factor for improved performance.

翻译：图像字幕模型一般缺乏考虑到用户兴趣的能力,通常默认于试图平衡可读性、信息性和信息超载的全球描述。另一方面, VQA 模型一般缺乏提供长描述性答案的能力,同时期望文字问题非常精确。我们提出了一个方法来控制图像字幕应该关注的概念, 使用被称为指导文本的额外输入, 即指向图像中可定位或不可定位的概念。我们的模型由基于变换的多式联运编码器组成, 该编码器将指导文本与全球和对象级图像特征一起用于生成用于生成导引字幕的早期聚合显示。虽然在视觉基因组数据方面受过培训的模型在使用自动对象标签时, 具有一种内在的优势, 在使用自动对象标签时, 我们发现在概念标题上受过培训的模型在外置图像和指导文本上会更好。我们的人类评价结果表明, 尝试在虚拟方向上进行导导导图说明需要访问大型、不受限制的图像集, 并且增加样式多样性(即使不增加词汇尺寸) 是改进性能的一个关键因素。

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

【CVPR2020-小鹏汽车】判别性多模态语音识别, Discriminative Multi-modality SR

【CVPR2020-小鹏汽车】判别性多模态语音识别, Discriminative Multi-modality SR

专知会员服务

41+阅读 · 2020年5月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

近期必读的7篇 CVPR 2019【视觉问答】相关论文和代码

近期必读的7篇 CVPR 2019【视觉问答】相关论文和代码

专知会员服务

37+阅读 · 2020年1月10日

【ICLR2020】面向层次重要性属性:神经序列模型的组成语义解释（Towards Hierarchical Importance Attribution:explaining compositional semantics for Neural Sequence Models）

【ICLR2020】面向层次重要性属性:神经序列模型的组成语义解释（Towards Hierarchical Importance Attribution:explaining compositional semantics for Neural Sequence Models）

专知会员服务

10+阅读 · 2019年12月24日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

CVPR 2019视频描述（video caption）相关论文总结

CVPR 2019视频描述（video caption）相关论文总结

极市平台

8+阅读 · 2019年10月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文 | CVPR2017有哪些值得读的Image Caption论文？

论文 | CVPR2017有哪些值得读的Image Caption论文？

黑龙江大学自然语言处理实验室

16+阅读 · 2017年12月1日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

Neural Image Captioning

Neural Image Captioning

Arxiv

5+阅读 · 2019年7月2日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Decoupled Novel Object Captioner

Arxiv

7+阅读 · 2018年4月11日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

VIP会员

文章信息

相关主题

相关VIP内容

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

【CVPR2020-小鹏汽车】判别性多模态语音识别, Discriminative Multi-modality SR

【CVPR2020-小鹏汽车】判别性多模态语音识别, Discriminative Multi-modality SR

专知会员服务

41+阅读 · 2020年5月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

近期必读的7篇 CVPR 2019【视觉问答】相关论文和代码

近期必读的7篇 CVPR 2019【视觉问答】相关论文和代码

专知会员服务

37+阅读 · 2020年1月10日

【ICLR2020】面向层次重要性属性:神经序列模型的组成语义解释（Towards Hierarchical Importance Attribution:explaining compositional semantics for Neural Sequence Models）

【ICLR2020】面向层次重要性属性:神经序列模型的组成语义解释（Towards Hierarchical Importance Attribution:explaining compositional semantics for Neural Sequence Models）

专知会员服务

10+阅读 · 2019年12月24日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

CVPR 2019视频描述（video caption）相关论文总结

CVPR 2019视频描述（video caption）相关论文总结

极市平台

8+阅读 · 2019年10月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文 | CVPR2017有哪些值得读的Image Caption论文？

论文 | CVPR2017有哪些值得读的Image Caption论文？

黑龙江大学自然语言处理实验室

16+阅读 · 2017年12月1日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

Neural Image Captioning

Neural Image Captioning

Arxiv

5+阅读 · 2019年7月2日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Decoupled Novel Object Captioner

Arxiv

7+阅读 · 2018年4月11日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

微信扫码咨询专知VIP会员