We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real images of text and that they can be trained using weak supervision are significant progresses. Second, we show the potential of our method for new applications, more specifically in the field of paleography, which studies the history and variations of handwriting, and for cipher analysis. We demonstrate our approach on three very different datasets: a printed volume of the Google1000 dataset, the Copiale cipher and historical handwritten charters from the 12th and early 13th century.
翻译:我们提出了一种针对文本行中字符分析和识别的生成式文档特定方法。我们的主要思想是建立在无监督的多对象分割方法上,特别是那些基于少量视觉元素(称为Sprites)重建图像的方法。在输入一组具有相似字体或手写风格的文本行时,我们的方法可以学习大量不同的字符并在有可用的行级注释时利用它们。我们的贡献有两个方面。首先,我们提供了第一个针对文本行分析的深度无监督多对象分割方法的适应和评估。由于这些方法主要在完全无监督的合成数据上进行了评估,证明它们可以被适应并在真实的文本图像上进行数量化评估,并且它们可以使用弱监督进行训练是重要的进展。其次,我们展示了我们的方法在新应用中的潜力,更具体地说是在古书法学(研究手写体的历史和变化)和密码分析领域。我们在三个非常不同的数据集上演示了我们的方法:Google1000数据集中的印刷卷、Copiale密码和12世纪初和13世纪早期的历史手写特许状。