State-of-the-art document dewarping techniques learn to predict 3-dimensional information of documents which are prone to errors while dealing with documents with irregular distortions or large variations in depth. This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions and improve document recognition in a reliable and simpler manner. FDRNet focuses on high-frequency components in the Fourier space that capture most structural information but are largely free of degradation in appearance. It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training. These features allow FDRNet to learn from a small amount of simply labeled training images, and the learned model can dewarp documents with complex geometric distortion and recognize the restored texts accurately. To facilitate document restoration research, we create a benchmark dataset consisting of over one thousand camera documents with different types of geometric and photometric distortion. Extensive experiments show that FDRNet outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks. In addition, FDRNet requires a small amount of simply labeled training data and is easy to deploy.
翻译:最新的文档扭曲技术学会预测在处理非正常扭曲或大变异的文件时容易出错的文件的三维信息。 本文展示了FDRNet, 即FDRNet。 FDRNet是一个傅里叶文件恢复网络, 它可以以可靠和简单的方式恢复不同扭曲的文件, 并改进对文件的识别。 FDRNet 侧重于傅里叶空间的高频部件, 收集大多数结构信息, 但外观基本没有退化。 它通过灵活Thin- Plate Spline转换使文件发生偏差, 可以有效处理各种变形, 而无需在培训中进行变形说明。 这些功能使得FDRNet能够从少量简单的标签化培训图像中学习, 所学的模型可以以复杂的几何扭曲的方式解动文件, 并准确地识别已修复的文本。 为了便利文件恢复研究, 我们创建了一套基准数据集, 由一千多个具有不同类型几何和光度扭曲的相机文件组成。 广泛的实验显示 FDRNet 超越了大边际的状态, 。 此外, FDRNet需要少量的简单配置数据。