Recognizing text from natural images is still a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular arrangements (curved, arbitrarily-oriented or seriously distorted), which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to capture the deep features of irregular texts (e.g. arbitrarily-oriented, perspective or curved), which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level labels. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method substantially outperforms the existing methods.
翻译:尽管数十年来对光学字符识别(OCR)进行了长期研究,但承认自然图像的文本仍是一项艰巨的任务,这是因为现场文本往往处于非常规安排(弯曲、任意取向或严重扭曲)中,文献中尚未充分述及这些安排。现有的文本识别方法主要是与常规(横向和正面)文本合作,不能被轻描淡写地笼统地用于处理非常规文本。在本文件中,我们开发了任意定向网络(AON),以捕捉非常规文本的深度特征(如任意定向、视角或曲线),这些文本被合并成基于关注的解码器以生成字符序列。整个网络可以通过仅使用图像和字级标签进行终端到终端培训。对各种基准进行广泛的实验,包括CUTE80、SVT-Persperpect、IIIT5k、SVT和ICDAR数据集,表明拟议的AON方法大大超出现有方法。