Restoring the original, flat appearance of a printed document from casual photographs of bent and wrinkled pages is a common everyday problem. In this paper we propose a novel method for grid-based single-image document unwarping. Our method performs geometric distortion correction via a deep fully convolutional neural network that learns to predict the 3D grid mesh of the document and the corresponding 2D unwarping grid in a multi-task fashion, implicitly encoding the coupling between the shape of a 3D object and its 2D image. We additionally create and publish our own dataset, called UVDoc, which combines pseudo-photorealistic document images with ground truth grid-based physical 3D and unwarping information, allowing unwarping models to train on data that is more realistic in appearance than the commonly used synthetic Doc3D dataset, whilst also being more physically accurate. Our dataset is labeled with all the information necessary to train our unwarping network, without having to engineer separate loss functions that can deal with the lack of ground-truth typically found in document in the wild datasets. We include a thorough evaluation that demonstrates that our dual-task unwarping network trained on a mix of synthetic and pseudo-photorealistic images achieves state-of-the-art performance on the DocUNet benchmark dataset. Our code, results and UVDoc dataset will be made publicly available upon publication.
翻译:从弯曲和皱纹页面的随意照片中恢复印刷文件的原始、平面外观是一个常见的日常问题。 在本文中,我们提出了一种基于网格的单一图像文档解扭曲的新方法。 我们的方法通过一个深层的全进化神经网络进行几何扭曲校正, 这个网络学会以多任务方式预测文档和相应的 2D 解扭曲的3D 图像的3D 网格网格网格, 隐含地将3D 对象的形状与其 2D 图像的形状混合起来。 我们还创建和发布我们自己的数据集, 称为 UVDoc, 它将假冒真实真实的文件图像与基于网格的物理 3D 和不扭曲的信息结合起来。 我们的方法通过一个不扭曲的模型, 来对看起来比常用的合成Do3DD数据集更现实的数据进行训练, 同时更准确。 我们的数据集与所有必要的信息贴上标签, 来训练我们的不扭曲的网络, 不必设计单独的损失功能, 来应对野生数据集中通常发现的地面图象的缺失。 我们的模拟数据库中的模拟模型模型模型将显示我们现有的模型的状态。