In recent years, tremendous efforts have been made on document image rectification, but existing advanced algorithms are limited to processing restricted document images, i.e., the input images must incorporate a complete document. Once the captured image merely involves a local text region, its rectification quality is degraded and unsatisfactory. Our previously proposed DocTr, a transformer-assisted network for document image rectification, also suffers from this limitation. In this work, we present DocTr++, a novel unified framework for document image rectification, without any restrictions on the input distorted images. Our major technical improvements can be concluded in three aspects. Firstly, we upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing. Secondly, we reformulate the pixel-wise mapping relationship between the unrestricted distorted document images and the distortion-free counterparts. The obtained data is used to train our DocTr++ for unrestricted document image rectification. Thirdly, we contribute a real-world test set and metrics applicable for evaluating the rectification quality. To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images. Extensive experiments are conducted, and the results demonstrate the effectiveness and superiority of our method. We hope our DocTr++ will serve as a strong baseline for generic document image rectification, prompting the further advancement and application of learning-based algorithms. The source code and the proposed dataset are publicly available at https://github.com/fh2019ustc/DocTr-Plus.
翻译:近年来,已经做出了巨大的努力来进行文档图像校正,但是现有的先进算法仅限于处理受限制的文档图像,即输入的图像必须包含完整的文档。一旦捕获的图像仅涉及一个局部文本区域,它的校正质量就会降低和不令人满意。我们之前提出的DocTr,以transformer为辅助的文档图像校正网络,也遭受了这种限制。在这项工作中,我们提出了DocTr ++,一种新的文档图像校正统一框架,没有对输入扭曲图像进行任何限制。我们的主要技术改进可以概括为三个方面。首先,我们通过采用分层的编码器-解码器结构进行多尺度表示提取和解析来升级原始架构。其次,我们重新构建了无限制扭曲文档图像和无失真对应物之间的像素映射关系。所得数据用于训练我们的DocTr ++用于无限制文档图像校正。第三,我们提供了一个真实世界的测试集和适用于评估校正质量的指标。据我们所知,这是第一个用于校正无限制文档图像的基于学习的方法。我们进行了大量实验,结果表明了我们方法的有效性和优越性。我们希望我们的DocTr ++将作为通用文档图像校正的强大基线,促进学习算法的进一步发展和应用。源代码和我们提出的数据集可以在https://github.com/fh2019ustc/DocTr-Plus上公开获得。