Detection and recognition of text in natural images are two main problems in the field of computer vision that have a wide variety of applications in analysis of sports videos, autonomous driving, industrial automation, to name a few. They face common challenging problems that are factors in how text is represented and affected by several environmental conditions. The current state-of-the-art scene text detection and/or recognition methods have exploited the witnessed advancement in deep learning architectures and reported a superior accuracy on benchmark datasets when tackling multi-resolution and multi-oriented text. However, there are still several remaining challenges affecting text in the wild images that cause existing methods to underperform due to there models are not able to generalize to unseen data and the insufficient labeled data. Thus, unlike previous surveys in this field, the objectives of this survey are as follows: first, offering the reader not only a review on the recent advancement in scene text detection and recognition, but also presenting the results of conducting extensive experiments using a unified evaluation framework that assesses pre-trained models of the selected methods on challenging cases, and applies the same evaluation criteria on these techniques. Second, identifying several existing challenges for detecting or recognizing text in the wild images, namely, in-plane-rotation, multi-oriented and multi-resolution text, perspective distortion, illumination reflection, partial occlusion, complex fonts, and special characters. Finally, the paper also presents insight into the potential research directions in this field to address some of the mentioned challenges that are still encountering scene text detection and recognition techniques.
翻译:自然图像中文本的探测和承认是计算机视觉领域的两个主要问题,在分析体育录像、自主驱动、工业自动化等方面,这些应用多种多样,在分析体育录像、自主驱动、工业自动化等方面,它们面临着共同的挑战性问题,这些问题是文字如何代表并受到若干环境条件影响的因素。目前最先进的现场文本探测和(或)识别方法利用了深层次学习结构的进展,并报告了在处理多分辨率和多方向文本时基准数据集的高度准确性。然而,仍然有一些挑战影响到野生图像中的文本,造成现有方法的不完善,因为由于模型无法概括到看不见的数据和标签不足的数据。因此,与以往在这一领域的调查不同,这次调查的目标如下:首先,不仅向读者提供对现场文本探测和识别最新进展的审查,而且还介绍了使用统一评价框架进行广泛的实验的结果,该框架评估了有关具有挑战性的案例的经过预先训练的选定方法的模式,并应用了这些技术的预见地标准。第二,查明了在野生图像中探测或确认部分方向的文本方面存在的若干挑战,即从图理学角度分析中看出了某种方向,在图理的实地分析中也具有了某种方向。