Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones. Recognizing low-resolution text images is challenging because they lose detailed content information, leading to poor recognition accuracy. An intuitive solution is to introduce super-resolution (SR) techniques as pre-processing. However, previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images (e.g.Bicubic down-sampling), which is simple and not suitable for real low-resolution text recognition. To this end, we pro-pose a real scene text SR dataset, termed TextZoom. It contains paired real low-resolution and high-resolution images which are captured by cameras with different focal length in the wild. It is more authentic and challenging than synthetic data, as shown in Fig. 1. We argue improv-ing the recognition accuracy is the ultimate goal for Scene Text SR. In this purpose, a new Text Super-Resolution Network termed TSRN, with three novel modules is developed. (1) A sequential residual block is proposed to extract the sequential information of the text images. (2) A boundary-aware loss is designed to sharpen the character boundaries. (3) A central alignment module is proposed to relieve the misalignment problem in TextZoom. Extensive experiments on TextZoom demonstrate that our TSRN largely improves the recognition accuracy by over 13%of CRNN, and by nearly 9.0% of ASTER and MORAN compared to synthetic SR data. Furthermore, our TSRN clearly outperforms 7 state-of-the-art SR methods in boosting the recognition accuracy of LR images in TextZoom. For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN. Our results suggest that low-resolution text recognition in the wild is far from being solved, thus more research effort is needed.
翻译:低分辨率文本图像通常在自然场景中看到,比如移动电话捕获的文件。认识到低分辨率文本图像具有挑战性,因为它们丢失了详细的内容信息,导致识别准确性差。一个直观的解决方案是将超级分辨率(SR)技术引入预处理。然而,以往的单一图像超级分辨率(SISR)方法在合成低分辨率图像(如Bicubic下映)上得到了培训,该图像简单,不适合真正的低分辨率文本识别。为此,我们推出一个真实的现场文本SR数据集,称为TextZoom。它包含真实的低分辨率和高分辨率图像,由野生中具有不同焦距的相机捕获。如Fig所示,它比合成数据更真实和更具挑战性。我们争论说,快速度的识别准确性是Scenender SR的最终目标。为此,一个新的文本超级分辨率网络名为TSRN,由三个新型模块开发。(1) 一个连续的残留区块块,以提取文本图像的顺序信息。(2) 相对的 Rex-al-al-al-al-al-al-al-al-al-al-lavely refor main 图像图像显示Z