Scene text image super-resolution (STISR) aims to simultaneously increase the resolution and legibility of the text images, and the resulting images will significantly affect the performance of downstream tasks. Although numerous progress has been made, existing approaches raise two crucial issues: (1) They neglect the global structure of the text, which bounds the semantic determinism of the scene text. (2) The priors, e.g., text prior or stroke prior, employed in existing works, are extracted from pre-trained text recognizers. That said, such priors suffer from the domain gap including low resolution and blurriness caused by poor imaging conditions, leading to incorrect guidance. Our work addresses these gaps and proposes a plug-and-play module dubbed Dual Prior Modulation Network (DPMN), which leverages dual image-level priors to bring performance gain over existing approaches. Specifically, two types of prior-guided refinement modules, each using the text mask or graphic recognition result of the low-quality SR image from the preceding layer, are designed to improve the structural clarity and semantic accuracy of the text, respectively. The following attention mechanism hence modulates two quality-enhanced images to attain a superior SR result. Extensive experiments validate that our method improves the image quality and boosts the performance of downstream tasks over five typical approaches on the benchmark. Substantial visualizations and ablation studies demonstrate the advantages of the proposed DPMN. Code is available at: https://github.com/jdfxzzy/DPMN.
翻译:虽然取得了许多进展,但现有办法提出了两个关键问题:(1) 忽视了文本的全球结构,该结构将现场案文的语义确定性加以约束。(2) 在现有工作中使用的前置文字,例如,先前的文字或中风的文字超分辨率(STISSR),是从经过培训的文本识别器中提取的。说,这些前置文字存在领域差距,包括由于图像条件差造成的分辨率低和模糊性,导致指导不正确。我们的工作解决了这些差距,并提出了一个插和播放模块,称为“双重前移动网络”(DPMN),该模块利用了双重图像级别,使现有办法的语义确定性能得到提高。具体地说,前导的精细微模块,每个都使用前层低质量的SRlFI图像的文字掩码或图形识别结果,目的是提高文本的结构清晰度和语义准确性,从而分别导致指导错误。以下的注意机制,从而改进了SDMR/DMLS的典型质量/下游图像,从而改进了我们现有的高质量/下游图像。