Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR difficulties, we propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text. We design two deformable architectures and conduct extensive experiments on both modern and historical datasets. Experimental results confirm the suitability of deformable convolutions for the HTR task.
翻译:由于书写风格的变化和页面质量的退化,处理历史文档的任务变得更加艰巨。 最先进的HTR方法通常将常规结构与用于视觉特征提取的进化神经网络进行建模的变形结构组合在一起。 由于进化核心在固定网格上界定,并独立关注所有输入像素,同时移动输入图像,这一战略忽视了手写字符在形状、规模和方向上甚至在同一文档中都可能有所不同这一事实,而墨水像素比背景文件更相关。为了应对这些具体的HTR困难,我们建议采用变形变形的演化结构,这种结构可以根据手头的输入进行变形,更好地适应文字的几何变形。我们设计了两种变形结构,并在现代和历史数据集上进行了广泛的实验。实验结果证实变形变形变形变形对任务是否适合HTR任务。