We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real text images and that they can be trained using weak supervision are significant progresses. Second, we demonstrate the potential of our method for new applications, more specifically in the field of paleography, which studies the history and variations of handwriting, and for cipher analysis. We evaluate our approach on three very different datasets: a printed volume of the Google1000 dataset, the Copiale cipher and historical handwritten charters from the 12th and early 13th century.
翻译:我们对文字线的字符分析和识别提出了一种针对具体文件的基因化方法。我们的主要想法是,在不受监督的多对象分解方法的基础上,特别是那些在有限的视觉元素(称为“图示”)的基础上重建图像的方法。我们的方法可以学习大量不同的字符,并利用现有的线性说明。我们的贡献是双重的。首先,我们为文本线分析提供了一种深度的、不受监督的多对象分解方法的第一次调整和评价。由于这些方法主要是在完全不受监督的环境中对合成数据进行评估,表明它们可以被调整和量化地评价真实文本图像,并且可以通过薄弱的监督来训练它们。第二,我们展示了我们新的应用方法的潜力,更具体地说,就是在古生物学领域,研究笔迹的历史和变化,并进行密码分析。我们用三种非常不同的数据集来评估我们的方法:谷歌1000数据集的印刷版、科皮亚勒和13世纪初的历史手写章程。我们用三种非常不同的数据集评估了我们的方法:12世纪和早期的Google1000数据集、Copiale和历史手写章程。