蒙面注意制导单层场景文字显示器 (MANGO: A Mask Attention Guided One-Stage Scene Text Spotter)

Recently end-to-end scene text spotting has become a popular research topic due to its advantages of global optimization and high maintainability in real applications. Most methods attempt to develop various region of interest (RoI) operations to concatenate the detection part and the sequence recognition part into a two-stage text spotting framework. However, in such framework, the recognition part is highly sensitive to the detected results (e.g.), the compactness of text contours). To address this problem, in this paper, we propose a novel Mask AttentioN Guided One-stage text spotting framework named MANGO, in which character sequences can be directly recognized without RoI operation. Concretely, a position-aware mask attention module is developed to generate attention weights on each text instance and its characters. It allows different text instances in an image to be allocated on different feature map channels which are further grouped as a batch of instance features. Finally, a lightweight sequence decoder is applied to generate the character sequences. It is worth noting that MANGO inherently adapts to arbitrary-shaped text spotting and can be trained end-to-end with only coarse position information (e.g.), rectangular bounding box) and text annotations. Experimental results show that the proposed method achieves competitive and even new state-of-the-art performance on both regular and irregular text spotting benchmarks, i.e., ICDAR 2013, ICDAR 2015, Total-Text, and SCUT-CTW1500.

翻译：最近端到端的现场文字定位由于全球优化的优势和真实应用中的高度可维护性,已成为一个受欢迎的研究专题。大多数方法都试图开发各种感兴趣的区域(ROI)操作,将检测部分和序列识别部分合并成一个两阶段的文本定位框架。但是,在这种框架内,识别部分对检测到的结果(例如,文本轮廓的紧凑性)非常敏感。为了解决这一问题,我们在本文件中提议了一个名为MANGO(MANGO)的阶段文字识别框架,在这个框架中,字符序列可以在不进行 RoI 操作的情况下直接得到承认。具体地说,开发一个位置觉觉觉的隐藏关注模块,以引起对每个文本实例及其字符的注意权重。它允许在不同特征地图频道上分配不同的文本实例,这些图像被进一步归为一组实例特征特征特征特征特征特征。最后,对生成字符序列序列应用了轻量序列解码。值得注意的是,MANGO(MaNGO)内在适应任意的文本定位,甚至可以经过培训的 Rent-ART (I-ROT) 的尾端点和直径图框中,只能显示常规- 和直径对等的图像的状态。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

[ICCV 2021] 从二到一：一种带有视觉语言建模网络的新场景文本识别器

专知会员服务

17+阅读 · 2021年10月17日

[CVPR 2021] 序列到序列对比学习的文本识别

专知会员服务

14+阅读 · 2021年5月2日

SIGIR2021接受论文列表公布！151篇论文都在这了！

专知会员服务

38+阅读 · 2021年4月27日

【AAAI2021】从类表单文档中提取零样本结构化信息:使用键和触发器进行预训练

专知会员服务

8+阅读 · 2021年2月4日