具有信心的无重复性多式版本文本能力变换器 (Confidence-aware Non-repetitive Multimodal Transformers for TextCaps) - 专知论文

会员服务 ·

0

多峰值 · 光学字符识别 · 变换 · MoDELS · OCR（光学字符识别） ·

2020 年 12 月 7 日

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

翻译：具有信心的无重复性多式版本文本能力变换器

Zhaokai Wang,Renda Bao,Qi Wu,Si Liu

from arxiv, 9 pages; Accepted by AAAI 2021

When describing an image, reading text in the visual scene is crucial to understand the key information. Recent work explores the TextCaps task, \emph{i.e.} image captioning with reading Optical Character Recognition (OCR) tokens, which requires models to read text and cover them in generated captions. Existing approaches fail to generate accurate descriptions because of their (1) poor reading ability; (2) inability to choose the crucial words among all extracted OCR tokens; (3) repetition of words in predicted captions. To this end, we propose a Confidence-aware Non-repetitive Multimodal Transformers (CNMT) to tackle the above challenges. Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens. To address the issue of word redundancy in captions, our Generation Module includes a repetition mask to avoid predicting repeated word in captions. Our model outperforms state-of-the-art models on TextCaps dataset, improving from 81.0 to 93.0 in CIDEr. Our source code is publicly available.

翻译：当描述图像时, 在视觉场景中读取文本对于理解关键信息至关重要。最近的工作探索了文本 Caps 任务, \ emph{ i. e. } 图像, 以读取光字符识别符( OCR) 符号进行字幕说明, 需要模型阅读文本并将其覆盖在生成的字幕中。现有方法无法生成准确描述, 原因是其 (1) 读能力差; (2) 无法在所有提取的OCR 符号中选择关键词; (3) 预言标题中重复单词。为此, 我们建议使用一个具有信心的非重复性多式变换器( CNMT) 来应对上述挑战。我们的CNMTM 包含一个阅读、推理和一代模块, 读取模块使用更好的 OCR 系统来增强读能力, 并嵌入最值得注意的符号。为了解决字幕中出现词冗余的问题, 我们的模块包含一个重复的遮罩, 以避免在标题中预测重复的单词。我们的模型优于。在 CIDER 数据设置上, 从 81.0 到 93. 源代码。

5

相关内容

多峰值

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

专知会员服务

62+阅读 · 2021年2月6日

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

【新书】图神经网络导论，清华大学刘知远老师著作

【新书】图神经网络导论，清华大学刘知远老师著作

专知会员服务

368+阅读 · 2020年6月12日

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年2月24日

数据挖掘大拿韩家炜：从非结构化文本到知识立方TextCube：自动化构建和多维探索

数据挖掘大拿韩家炜：从非结构化文本到知识立方TextCube：自动化构建和多维探索

专知会员服务

101+阅读 · 2019年12月28日

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

专知会员服务

44+阅读 · 2019年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【IJCAI 2019 | tutorial】文本生成中的艺术字 Creative and Artistic Writing via Text Generation，北京大学|严睿

【IJCAI 2019 | tutorial】文本生成中的艺术字 Creative and Artistic Writing via Text Generation，北京大学|严睿

专知会员服务

16+阅读 · 2019年8月12日

AAAI2020 图相关论文集

AAAI2020 图相关论文集

图与推荐

11+阅读 · 2020年7月15日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

R语言自然语言处理：关键词提取与文本摘要（TextRank）

R语言自然语言处理：关键词提取与文本摘要（TextRank）

R语言中文社区

4+阅读 · 2019年3月18日

清华大学自然语言处理组年度巨献：370+篇机器翻译必读论文，一文收尽

清华大学自然语言处理组年度巨献：370+篇机器翻译必读论文，一文收尽

专知

3+阅读 · 2018年12月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Hierarchical Contextualized Representation for Named Entity Recognition

Hierarchical Contextualized Representation for Named Entity Recognition

Arxiv

4+阅读 · 2019年11月19日

Span-based Joint Entity and Relation Extraction with Transformer Pre-training

Arxiv

7+阅读 · 2019年9月17日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Factorized Attention: Self-Attention with Linear Complexities

Factorized Attention: Self-Attention with Linear Complexities

Arxiv

5+阅读 · 2018年12月4日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月3日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Multimodal Named Entity Recognition for Short Social Media Posts

Arxiv

8+阅读 · 2018年2月22日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

VIP会员

文章信息

相关主题

光学字符识别

OCR（光学字符识别）

相关VIP内容

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

专知会员服务

62+阅读 · 2021年2月6日

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

【新书】图神经网络导论，清华大学刘知远老师著作

【新书】图神经网络导论，清华大学刘知远老师著作

专知会员服务

368+阅读 · 2020年6月12日

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年2月24日

数据挖掘大拿韩家炜：从非结构化文本到知识立方TextCube：自动化构建和多维探索

数据挖掘大拿韩家炜：从非结构化文本到知识立方TextCube：自动化构建和多维探索

专知会员服务

101+阅读 · 2019年12月28日

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

专知会员服务

44+阅读 · 2019年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【IJCAI 2019 | tutorial】文本生成中的艺术字 Creative and Artistic Writing via Text Generation，北京大学|严睿

【IJCAI 2019 | tutorial】文本生成中的艺术字 Creative and Artistic Writing via Text Generation，北京大学|严睿

专知会员服务

16+阅读 · 2019年8月12日

热门VIP内容

开通专知VIP会员享更多权益服务

数据智能体综述：新兴范式还是被高估的炒作？

海底战已至：美国构思海底安全战略 | 最新报告

【ICCV2025教程】视觉异常检测中的基础模型：进展、挑战与应用

美军将无人自主等新技术融入潜艇部队以更具杀伤力

相关资讯

AAAI2020 图相关论文集

AAAI2020 图相关论文集

图与推荐

11+阅读 · 2020年7月15日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

R语言自然语言处理：关键词提取与文本摘要（TextRank）

R语言自然语言处理：关键词提取与文本摘要（TextRank）

R语言中文社区

4+阅读 · 2019年3月18日

清华大学自然语言处理组年度巨献：370+篇机器翻译必读论文，一文收尽

清华大学自然语言处理组年度巨献：370+篇机器翻译必读论文，一文收尽

专知

3+阅读 · 2018年12月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Hierarchical Contextualized Representation for Named Entity Recognition

Hierarchical Contextualized Representation for Named Entity Recognition

Arxiv

4+阅读 · 2019年11月19日

Span-based Joint Entity and Relation Extraction with Transformer Pre-training

Arxiv

7+阅读 · 2019年9月17日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Factorized Attention: Self-Attention with Linear Complexities

Factorized Attention: Self-Attention with Linear Complexities

Arxiv

5+阅读 · 2018年12月4日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月3日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Multimodal Named Entity Recognition for Short Social Media Posts

Arxiv

8+阅读 · 2018年2月22日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

微信扫码咨询专知VIP会员