Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Both subtasks require considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text removal dataset does not allow existing methods to realize their potential. To compensate for the lack of pairwise real-world data, we made considerable use of synthetic text after additional enhancement and subsequently trained our model only on the dataset generated by the improved synthetic text engine. Our proposed network contains a stroke mask prediction module and background inpainting module that can extract the text stroke as a relatively small hole from the cropped text image to maintain more background content for better inpainting results. This model can partially erase text instances in a scene image with a bounding box or work with an existing scene-text detector for automatic scene text erasing. The experimental results from the qualitative and quantitative evaluation on the SCUT-Syn, ICDAR2013, and SCUT-EnsText datasets demonstrate that our method significantly outperforms existing state-of-the-art methods even when they are trained on real-world data.
翻译:以自然图像中合理内容取代文本区域的显微脱色功能,在近年来引起了计算机视觉界的极大注意。在现场文字删除中存在两个潜在的子任务:文本检测和图像油漆。这两个子任务都需要大量数据才能取得更好的性能;然而,缺乏大规模真实世界场景文本删除数据集无法使现有方法实现其潜力。为了弥补缺乏对称真实世界数据的情况,我们在进一步加强后大量使用了合成文本,并随后仅对改进的合成文本引擎产生的数据集进行了我们模型的培训。我们提议的网络包含一个中风遮罩预测模块和背景油漆模块,可以从裁剪文本图像中提取出相对较小的孔,以保持更多的背景内容,更好地油漆结果。这个模型可以部分地消除现场图像中的文字实例,用捆绑框或工作方式使用现有现场文本检测器进行自动删除。在 SCCUT-Syn、 ICDAR-2013 和 SCUT-Ar-Ensimper 数据配置方法上经过大量培训的SCT-FSD-S-SDS-SDS-SD-SDSD-SD-SDSDSD-SDSDSDSDSDSDSDSDS-S