走向端对端手写文件识别 (Towards End-to-end Handwritten Document Recognition)

Handwritten text recognition has been widely studied in the last decades for its numerous applications. Nowadays, the state-of-the-art approach consists in a three-step process. The document is segmented into text lines, which are then ordered and recognized. However, this three-step approach has many drawbacks. The three steps are treated independently whereas they are closely related. Errors accumulate from one step to the other. The ordering step is based on heuristic rules which prevent its use for documents with a complex layouts or for heterogeneous documents. The need for additional physical segmentation annotations for training the segmentation stage is inherent to this approach. In this thesis, we propose to tackle these issues by performing the handwritten text recognition of whole document in an end-to-end way. To this aim, we gradually increase the difficulty of the recognition task, moving from isolated lines to paragraphs, and then to whole documents. We proposed an approach at the line level, based on a fully convolutional network, in order to design a first generic feature extraction step for the handwriting recognition task. Based on this preliminary work, we studied two different approaches to recognize handwritten paragraphs. We reached state-of-the-art results at paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed the line-level state of the art on these datasets. We finally proposed the first end-to-end approach dedicated to the recognition of both text and layout, at document level. Characters and layout tokens are sequentially predicted following a learned reading order. We proposed two new metrics we used to evaluate this task on the RIMES 2009 and READ 2016 dataset, at page level and double-page level.

翻译：近几十年来,对大量应用程序的手动文本识别进行了广泛研究。如今, 最先进的文本识别方法由三步过程组成。此文档分为文字行, 然后进行排序和识别。但是, 三步方法有许多缺点。三步方法是独立的, 三个步骤是密切相关的。错误从一个步骤累积到另一个步骤。命令步骤是基于超常规则, 防止它用于具有复杂布局的文档或杂交文档。培训分解阶段需要额外的物理分解说明, 这是这一方法所固有的。在此结论中, 我们提议通过对整份文件进行手写文本识别, 以至最后命令方式进行分解。为了达到这一目的, 我们逐渐增加了识别任务的难度, 从孤立的行到段落, 然后到整个文件。我们建议了一条线级, 以完全进化的网络为基础, 为笔迹识别任务设计第一个通用的特征提取步骤。基于这一初步工作, 我们研究了两个不同的版本方法, 在最终的 RIM 和最后的 RIM 水平上, 我们用了两个直径方向, 我们用了两个直线级的 RIS 和最后的 RIS 格式级。