TRIG:具有初始嵌入指南的基于变式文本识别器 (TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance) - 专知论文

会员服务 ·

0

Guidance · Extensibility · 可约的 · 变换 · 卷积神经网络 ·

2021 年 11 月 16 日

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

翻译：TRIG:具有初始嵌入指南的基于变式文本识别器

Yue Tao,Zhiwei Jia,Runze Ma,Shugong Xu

Scene text recognition (STR) is an important bridge between images and text, attracting abundant research attention. While convolutional neural networks (CNNS) have achieved remarkable progress in this task, most of the existing works need an extra module (context modeling module) to help CNN to capture global dependencies to solve the inductive bias and strengthen the relationship between text features. Recently, the transformer has been proposed as a promising network for global context modeling by self-attention mechanism, but one of the main shortcomings, when applied to recognition, is the efficiency. We propose a 1-D split to address the challenges of complexity and replace the CNN with the transformer encoder to reduce the need for a context modeling module. Furthermore, recent methods use a frozen initial embedding to guide the decoder to decode the features to text, leading to a loss of accuracy. We propose to use a learnable initial embedding learned from the transformer encoder to make it adaptive to different input images. Above all, we introduce a novel architecture for text recognition, named TRansformer-based text recognizer with Initial embedding Guidance (TRIG), composed of three stages (transformation, feature extraction, and prediction). Extensive experiments show that our approach can achieve state-of-the-art on text recognition benchmarks.

翻译：图像和文字识别(STR)是图像和文字之间重要的桥梁,吸引了大量的研究关注。虽然进化神经网络(CNNS)在这一任务中取得了显著的进展,但大多数现有作品都需要一个额外的模块(文字建模模块),以帮助CNN捕捉全球依赖性,以解决感应偏差,并加强文本特征之间的关系。最近,变压器被提议为通过自读机制进行全球背景建模的有希望的网络,但主要缺陷之一是效率。我们建议用1D拆分处理复杂性的挑战,用变压器编码器取代CNN,以减少对上下文建模模块的需要。此外,最近的方法是用冻结的初始嵌入式来引导解码器解码文字特征,导致准确性损失。我们提议使用从变压器编码器中学习的初始嵌入器,使其适应不同的输入图像。最重要的是,我们引入了一种新颖的文本识别结构,名为TRansformed 文本识别器,以初始嵌入式建模制导模(TRIG),以三个阶段的图像识别模型。

0

相关内容

Guidance

现代企业架构白皮书-数字化转型底层方法论，72页pdf

专知会员服务

86+阅读 · 2021年6月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

专知会员服务

100+阅读 · 2019年11月15日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Attention最新进展

Attention最新进展

极市平台

5+阅读 · 2020年5月30日

语言模型及Word2vec与Bert简析

语言模型及Word2vec与Bert简析

AINLP

6+阅读 · 2020年5月7日

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

(OpenCV/Keras)用手势控制的计算器

(OpenCV/Keras)用手势控制的计算器

机器学习研究会

3+阅读 · 2018年3月4日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Arxiv

4+阅读 · 2020年3月27日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Span-based Joint Entity and Relation Extraction with Transformer Pre-training

Arxiv

7+阅读 · 2019年9月17日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

End-to-End Text Classification via Image-based Embedding using Character-level Networks

End-to-End Text Classification via Image-based Embedding using Character-level Networks

Arxiv

5+阅读 · 2018年10月10日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Universal Language Model Fine-tuning for Text Classification

Arxiv

3+阅读 · 2018年5月23日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

卷积神经网络

相关VIP内容

现代企业架构白皮书-数字化转型底层方法论，72页pdf

专知会员服务

86+阅读 · 2021年6月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

专知会员服务

100+阅读 · 2019年11月15日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Attention最新进展

Attention最新进展

极市平台

5+阅读 · 2020年5月30日

语言模型及Word2vec与Bert简析

语言模型及Word2vec与Bert简析

AINLP

6+阅读 · 2020年5月7日

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

(OpenCV/Keras)用手势控制的计算器

(OpenCV/Keras)用手势控制的计算器

机器学习研究会

3+阅读 · 2018年3月4日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Arxiv

4+阅读 · 2020年3月27日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Span-based Joint Entity and Relation Extraction with Transformer Pre-training

Arxiv

7+阅读 · 2019年9月17日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

End-to-End Text Classification via Image-based Embedding using Character-level Networks

End-to-End Text Classification via Image-based Embedding using Character-level Networks

Arxiv

5+阅读 · 2018年10月10日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Universal Language Model Fine-tuning for Text Classification

Arxiv

3+阅读 · 2018年5月23日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员