Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Either sub-task requires considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text removal dataset allows the existing methods to not work in full strength. To avoid the limitation of the lack of pairwise real-world data, we enhance and make full use of the synthetic text and consequently train our model only on the dataset generated by the improved synthetic text engine. Our proposed network contains a stroke mask prediction module and background inpainting module that can extract the text stroke as a relatively small hole from the text image patch to maintain more background content for better inpainting results. This model can partially erase text instances in a scene image with a bounding box provided or work with an existing scene text detector for automatic scene text erasing. The experimental results of qualitative evaluation and quantitative evaluation on the SCUT-Syn, ICDAR2013, and SCUT-EnsText datasets demonstrate that our method significantly outperforms existing state-of-the-art methods even when trained on real-world data.
翻译:以自然图像中合理内容取代文本区域的显示删除过程近年来在计算机视觉界引起注意。在现场文字删除中存在两个潜在的子任务:文本检测和图像油漆。任何子任务都需要大量数据才能取得更好的性能;然而,缺乏大规模真实世界的场景文本删除数据集使得现有方法无法充分发挥作用。为了避免缺乏对称真实世界数据的限制,我们加强并充分利用合成文本,并因此只用改进的合成文本引擎生成的数据集来培训我们的模型。我们提议的网络包含一个中风遮罩预测模块和背景油漆模块,可以从文本图像补丁中提取出相对较小的孔,以保持更多的背景内容,更好地油漆结果。这一模型可以部分地消除现场图像中的文本实例,同时提供一个捆绑框,或者与现有场景文本检测器一起工作,以自动删除场景文本。在 SCUT- Syn、 ICDARof 2013 和 SCUT-Enterform 数据格式上演示了我们经过大量培训的实时数据方法,从而在现实数据系统中展示了我们的现有数据格式。