With the advent of mobile and hand-held cameras, document images have found their way into almost every domain. Dewarping of these images for the removal of perspective distortions and folds is essential so that they can be understood by document recognition algorithms. For this, we propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input. We train this model on warped document images simulated synthetically to compensate for lack of enough natural data. Our method is novel in the use of a bifurcated decoder with shared weights to prevent intermingling of grid coordinates, in the use of residual networks in the U-Net skip connections to allow flow of data from different receptive fields in the model, and in the use of a gated network to help the model focus on structure and line level detail of the document image. We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
翻译:随着移动和手持相机的出现,文档图像几乎进入了每一个领域。将这些图像移开以去除视图扭曲和折叠,对于通过文件识别算法来理解这些图像至关重要。 为此,我们建议建立一个端对端CNN结构,能够从它所输入的扭曲文档中产生无扭曲的文档图像。我们用模拟合成模拟的扭曲文档图像模型来弥补缺乏足够自然数据的情况。我们的方法是新颖的,即使用一个具有共享重力的分离解码器来防止网格坐标的交错,使用U-Net连接中的剩余网络来允许来自模型中不同可接收域的数据流动,以及使用一个门式网络来帮助模型关注文件图像的结构和线级细节。我们评估了我们关于DocUNet数据集的方法,这是该领域的一个基准,并取得了与最新方法可比的结果。