一种在自然场景中探测任意形状文本的方法,可以改进文本定位 (A method for detecting text of arbitrary shapes in natural scenes that improves text spotting)

Understanding the meaning of text in images of natural scenes like highway signs or store front emblems is particularly challenging if the text is foreshortened in the image or the letters are artistically distorted. We introduce a pipeline-based text spotting framework that can both detect and recognize text in various fonts, shapes, and orientations in natural scene images with complicated backgrounds. The main contribution of our work is the text detection component, which we call UHT, short for UNet, Heatmap, and Textfill. UHT uses a UNet to compute heatmaps for candidate text regions and a textfill algorithm to produce tight polygonal boundaries around each word in the candidate text. Our method trains the UNet with groundtruth heatmaps that we obtain from text bounding polygons provided by groundtruth annotations. Our text spotting framework, called UHTA, combines UHT with the state-of-the-art text recognition system ASTER. Experiments on four challenging and public scene-text-detection datasets (Total-Text, SCUT-CTW1500, MSRA-TD500, and COCO-Text) show the effectiveness and generalization ability of UHT in detecting not only multilingual (potentially rotated) straight but also curved text in scripts of multiple languages. Our experimental results of UHTA on the Total-Text dataset show that UHTA outperforms four state-of-the-art text spotting frameworks by at least 9.1 percent points in the F-measure, which suggests that UHTA may be used as a complete text detection and recognition system in real applications.

翻译：如果文本在图像或字母中被预示为图像或字母时被艺术扭曲,那么理解自然场景图像中文本的含义就特别具有挑战性。我们采用基于管道的文本检测框架,既能检测和识别具有复杂背景的自然场景图像中各种字体、形状和方向的文本。我们工作的主要贡献是文本检测部分,我们称之为UHT,用于UNet、Haatmap和文本填充。UHT使用 Unet 来为候选文本区域计算热映射和文本填充算法在候选文本中每个词周围产生严格的多边边界。我们的方法是用地面图的热映射框架来对UNet进行测试,我们从地面图解提供的各种字体、形状和方向图像图像图像图像图像图像中获取。我们称之为UHIT,将UHT与最先进的文本识别系统 ASTER。对四个具有挑战性的和公开的场景-文本解析数据数据集进行了实验(Text、SC-CTTLT 和CO-TLOV-T 的文本测试框架中显示我们普通的系统-TLOV-T-T-Treval-tradestrisal 的文本识别能力,这仅显示了我们普通的系统-O-TFli-tal-tal-trad Studal-trad Studals 的文本的文本识别系统,这只能显示我们用于的文本和C-TUIUHIUIUB-TF-tal-TF-TF-TF-tal-tal-tal-tal-tal-C-tal-tal-tal-tracal-tal-tal-tal-tal-tal-tal-tal-tal-tal-S-tal-deal-T) 的系统的系统的系统的文本的文本格式的文本格式,这只显示,这只能显示了我们的文本格式的文本识别结果。