Recent research in the field of text localization in a resource constrained environment has made extensive use of deep neural networks. Scene text localization and recognition on low-memory mobile devices have a wide range of applications including content extraction, image categorization and keyword based image search. For text recognition of multi-lingual localized text, the OCR systems require prior knowledge of the script of each text instance. This leads to word script identification being an essential step for text recognition. Most existing methods treat text localization, script identification and text recognition as three separate tasks. This makes script identification an overhead in the recognition pipeline. To reduce this overhead, we propose TeLCoS: OnDevice Text Localization with Clustering of Script, a multi-task dual branch lightweight CNN network that performs real-time on device Text Localization and High-level Script Clustering simultaneously. The network drastically reduces the number of calls to a separate script identification module, by grouping and identifying some majorly used scripts through a single feed-forward pass over the localization network. We also introduce a novel structural similarity based channel pruning mechanism to build an efficient network with only 1.15M parameters. Experiments on benchmark datasets suggest that our method achieves state-of-the-art performance, with execution latency of 60 ms for the entire pipeline on the Exynos 990 chipset device.
翻译:在资源受限环境中,最近对文本定位领域的研究广泛利用了深层神经网络。在低模移动设备上,现场文字定位和识别有各种各样的应用,包括内容提取、图像分类和基于关键字的图像搜索。对于多语言本地文本的文本识别,OCR系统需要事先了解每个文本实例的脚本。这导致文字脚本识别是文本识别的一个基本步骤。大多数现有方法将文字本地化、脚本识别和文本识别作为三项单独任务处理。这使得脚本识别成为识别管道的间接费用。为了减少这一间接费用,我们提议TELCOS:On Devictle Text本地化与Script的集群,这是一个多任务双分支双分支轻重CNN网络,实时运行设备文本本地化和高级剪贴组合。网络大幅减少了单立脚本识别模块的通话次数,通过在本地化网络上的一个单一的进路前传分级识别一些主要使用的脚本。我们还引入了一种新型的结构相似的结构相似的频道配置-90文本,用于建立高效的直径网络,只使用直径阵线路段的运行参数。