Annotating words in a historical document image archive for word image recognition purpose demands time and skilled human resource (like historians, paleographers). In a real-life scenario, obtaining sample images for all possible words is also not feasible. However, Zero-shot learning methods could aptly be used to recognize unseen/out-of-lexicon words in such historical document images. Based on previous state-of-the-art method for zero-shot word recognition Pho(SC)Net, we propose a hybrid model based on the CTC framework (Pho(SC)-CTC) that takes advantage of the rich features learned by Pho(SC)Net followed by a connectionist temporal classification (CTC) framework to perform the final classification. Encouraging results were obtained on two publicly available historical document datasets and one synthetic handwritten dataset, which justifies the efficacy of Pho(SC)-CTC and Pho(SC)Net.
翻译:在现实生活中,获取所有可能单词的样本图像也是不可行的,然而,在历史文件图像中,可以恰当地使用零光学习方法来识别未见/未读的文字,根据以往最先进的零光单词识别方法Pho(SC)Net,我们提议基于CTC框架(Pho(SC)-CTC)的混合模型,利用Pho(SC)Net所学的丰富特征,然后是连接时间分类框架,进行最后分类,在两种公开的历史文件数据集和一套合成手写数据集上取得了令人鼓舞的结果,这证明Pho(SC)-CTC和Pho(SC)Net的功效是正确的。