Although recent works based on deep learning have made progress in improving recognition accuracy on scene text recognition, how to handle low-quality text images in end-to-end deep networks remains a research challenge. In this paper, we propose an Iterative Fusion based Recognizer (IFR) for low quality scene text recognition, taking advantage of refined text images input and robust feature representation. IFR contains two branches which focus on scene text recognition and low quality scene text image recovery respectively. We utilize an iterative collaboration between two branches, which can effectively alleviate the impact of low quality input. A feature fusion module is proposed to strengthen the feature representation of the two branches, where the features from the Recognizer are Fused with image Restoration branch, referred to as RRF. Without changing the recognition network structure, extensive quantitative and qualitative experimental results show that the proposed method significantly outperforms the baseline methods in boosting the recognition accuracy of benchmark datasets and low resolution images in TextZoom dataset.
翻译:虽然最近在深层学习基础上开展的工作在提高现场文本识别的准确度方面取得了进展,但在如何处理端至端深端网络中的低质量文本图像方面仍是一项研究挑战。在本文件中,我们提议使用基于循环融合的识别器(IFR)进行低质量现场文本识别,利用精细的文本图像输入和强健的特征代表。IFR包含两个分支,分别侧重于现场文本识别和低质量现场文本图像恢复。我们利用两个分支之间的迭接协作,可以有效减轻低质量输入的影响。我们提议了一个特性聚合模块,以加强两个分支的特征代表,在这两个分支中,对识别器的特征进行粉刷,称为RRF。在不改变识别网络结构的情况下,广泛的定量和定性实验结果显示,拟议方法大大优于基线方法,提高了TextZoom数据集中基准数据集和低分辨率图像的准确度。