Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). For example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the message contents. Thus, in this paper we address this problem through a data generation technique based on Bayesian Program Learning (BPL). Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol in the alphabet. After generating symbols, we create synthetic lines to train state-of-the-art HTR architectures in a segmentation free fashion. Quantitative and qualitative analyses were carried out and confirm the effectiveness of the proposed method.
翻译:低资源手写文本识别(HTR)是一个棘手的问题,因为缺少附加说明的数据,语言信息(词典和语言模式)非常有限。例如,在历史密码手稿中,通常用发明的字母写来隐藏信息内容。因此,在本文中,我们通过基于巴伊西亚方案学习(BPL)的数据生成技术来解决这一问题。 与传统一代方法(需要大量附加说明的图像)相反,我们的方法能够产生像人一样的笔迹,只使用字母表中每个符号的样本。在生成符号后,我们创建合成线,以自由分割方式培训最先进的HTR结构。进行了定量和定性分析,并证实了拟议方法的有效性。