We present a Neural Network based Handwritten Text Recognition (HTR) model architecture that can be trained to recognize full pages of handwritten or printed text without image segmentation. Being based on an Image to Sequence architecture, it can be trained to extract text present in an image and sequence it correctly without imposing any constraints on language, shape of characters or orientation and layout of text and non-text. The model can also be trained to generate auxiliary markup related to formatting, layout and content. We use character level token vocabulary, thereby supporting proper nouns and terminology of any subject. The model achieves a new state-of-art in full page recognition on the IAM dataset and when evaluated on scans of real world handwritten free form test answers - a dataset beset with curved and slanted lines, drawings, tables, math, chemistry and other symbols - it performs better than all commercially available HTR APIs. It is deployed in production as part of a commercial web application.
翻译:我们推出一个基于神经网络的手写文本识别模型(HTR), 可以通过培训来识别完整页的手写文本或印刷文本,而无需图像分割。 以图像到序列结构为基础, 它可以被培训以正确图像和顺序提取文本, 而不会对语言、 字符形状或方向以及文本和非文本的布局施加任何限制。 该模型还可以被培训产生与格式、 版式和内容相关的辅助标记。 我们使用字符级符号词汇, 从而支持任何主题的适当名词和术语。 该模型在 IAM 数据集上实现了一个新的全页艺术状态, 并在对真实世界手写自由表格的测试答案进行扫描时被评估---- 一个带有曲线和倾斜线的数据集, 绘图、 表格、 数学、 化学 和其他符号---- 它的表现优于所有商业上可用的 HTR APIs 。 它作为商业网络应用程序的一部分被部署在生产中。