Text erasure from an image is helpful for various tasks such as image editing and privacy preservation. In this paper, we present TPFNet, a novel one-stage (end-toend) network for text removal from images. Our network has two parts: feature synthesis and image generation. Since noise can be more effectively removed from low-resolution images, part 1 operates on low-resolution images. The output of part 1 is a low-resolution text-free image. Part 2 uses the features learned in part 1 to predict a high-resolution text-free image. In part 1, we use "pyramidal vision transformer" (PVT) as the encoder. Further, we use a novel multi-headed decoder that generates a high-pass filtered image and a segmentation map, in addition to a text-free image. The segmentation branch helps locate the text precisely, and the high-pass branch helps in learning the image structure. To precisely locate the text, TPFNet employs an adversarial loss that is conditional on the segmentation map rather than the input image. On Oxford, SCUT, and SCUT-EnsText datasets, our network outperforms recently proposed networks on nearly all the metrics. For example, on SCUT-EnsText dataset, TPFNet has a PSNR (higher is better) of 39.0 and text-detection precision (lower is better) of 21.1, compared to the best previous technique, which has a PSNR of 32.3 and precision of 53.2. The source code can be obtained from https://github.com/CandleLabAI/TPFNet
翻译:从图像中删除文本有助于图像编辑和隐私保存等各种任务 。 在本文中, 我们展示 TPFNet 是一个用于从图像中删除文本的新颖的一阶段( 端到端) 网络 。 我们的网络有两个部分: 特性合成和图像生成。 由于噪音可以更有效地从低分辨率图像中去除, 第1部分在低分辨率图像上运行。 第1部分的输出是低分辨率文本无图像。 第2部分使用第1部分所学的功能来预测高分辨率文本无39图像。 在第一部分中, 我们使用“ 仿真视觉变异器( PVT) 来作为编码。 此外, 我们的网络使用一个新颖的多头解码器, 除了无文本图像之外, 还生成高过过滤图像和分解图。 分解处有助于准确地定位文本, 高通度的分支有助于学习图像结构。 为了精确定位, TPPPFT 网络使用一个以高分辨率的线性损失, 而不是以源输入图像为条件。 在Ox、 SCUT 和 ES- Enext 数据网络的精度系统将更精确性、 PGSDRFS 的精度比前的S- 和SDRFT 。