Compared with flatbed scanners, portable smartphones are much more convenient for physical documents digitizing. However, such digitized documents are often distorted due to uncontrolled physical deformations, camera positions, and illumination variations. To this end, we present DocScanner, a novel framework for document image rectification. Different from existing methods, DocScanner addresses this issue by introducing a progressive learning mechanism. Specifically, DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior performance, while the lightweight recurrent architecture ensures the running efficiency. In addition, before the above rectification process, observing the corrupted rectified boundaries existing in prior works, DocScanner exploits a document localization module to explicitly segment the foreground document from the cluttered background environments. To further improve the rectification quality, based on the geometric priori between the distorted and the rectified images, a geometric regularization is introduced during training to further improve the performance. Extensive experiments are conducted on the Doc3D dataset and the DocUNet Benchmark dataset, and the quantitative and qualitative evaluation results verify the effectiveness of DocScanner, which outperforms previous methods on OCR accuracy, image similarity, and our proposed distortion metric by a considerable margin. Furthermore, our DocScanner shows the highest efficiency in runtime latency and model size.
翻译:与平板扫描仪相比,便携式智能手机更便于物理文档数字化。然而,由于不受控制的物理变形、相机位置和照明变异,这些数字化文件往往被扭曲。为此,我们介绍了文件图像校正的新框架DocScanner。与现有方法不同,DocScanner采用渐进学习机制来解决这个问题。具体地说,DocScanner对纠正后的图像维持单一估计,并用经常结构逐步校正。迭代改进使DocScScanner趋于稳健和优异的性能,而轻量级经常结构则确保运行效率。此外,在上述校正化进程之前,我们发现先前工作中存在的被纠正的错误界限,DocScScanner利用了一个文件本地化模块,以明确分隔背景环境中的浅色文件。具体化质量,根据扭曲图像和校正结构的地貌,在培训期间引入了地理校正的正规化,以进一步改进业绩。在Doc3DDO数据设置上进行了广泛的实验,并用纸质标准S的精确度来验证我们先前的模型和纸质评估。