ReADS: 场景文字识别校正关注双双监督网络 (ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition)

In recent years, scene text recognition is always regarded as a sequence-to-sequence problem. Connectionist Temporal Classification (CTC) and Attentional sequence recognition (Attn) are two very prevailing approaches to tackle this problem while they may fail in some scenarios respectively. CTC concentrates more on every individual character but is weak in text semantic dependency modeling. Attn based methods have better context semantic modeling ability while tends to overfit on limited training data. In this paper, we elaborately design a Rectified Attentional Double Supervised Network (ReADS) for general scene text recognition. To overcome the weakness of CTC and Attn, both of them are applied in our method but with different modules in two supervised branches which can make a complementary to each other. Moreover, effective spatial and channel attention mechanisms are introduced to eliminate background noise and extract valid foreground information. Finally, a simple rectified network is implemented to rectify irregular text. The ReADS can be trained end-to-end and only word-level annotations are required. Extensive experiments on various benchmarks verify the effectiveness of ReADS which achieves state-of-the-art performance.

翻译：近年来,场景文本识别总是被视为一个从顺序到顺序的问题。连接时间分类(CTC)和注意序列识别(Attn)是解决这一问题的两种非常普遍的办法,尽管在某些情形中可能发生故障。 CTC更多地关注每个个个性,但在文字语义依赖性模型方面薄弱。以Attn为基础的方法具有更好的背景语义建模能力,但往往过于适合有限的培训数据。在本文中,我们精心设计了一个经过校正的注意性双倍监控网络(ReADS),用于一般场景文本识别。为了克服CTC和Attn的弱点,这两种方法都适用于我们的方法,但两个受监督的分支的不同模块都可相互补充。此外,还引入有效的空间和引导关注机制,以消除背景噪音和提取有效的地面信息。最后,实施了一个简单的校正网络,以纠正不规则的文本。ReADS可以经过培训,只需要文字级别的说明。在各种基准上进行广泛的实验,以核实READS的有效性,从而实现状态的绩效。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日