项目名称: 基于信息融合的维吾尔文联机手写单词识别技术研究
项目编号: No.61263038
项目类型: 地区科学基金项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 地里木拉提·吐尔逊
作者单位: 新疆大学
项目金额: 45万元
中文摘要: 维吾尔文手写单词识别中的主要难点是字母在被识别之前不能被精确地切分,仍存在着不能被直接识别的连体段。本项目结合维吾尔文单词的独特手写风格,研究有效的基于分割与识别相集成的方法:利用基于识别的切分方法将单词切分为连体段后,再将其切分成基本单元序列。将连续的基元合并成候选字符并构成切分候选网格;将几何上下文信息、字母识别信息和语言上下文信息一起加入到路径评价准则,得到最优的切分结果及对应的最优识别结果。其中,集束搜索算法和动态规划算法用于单词识别过程中的最优路径搜索。几何信息包括字母一元几何信息和字母间的二元几何信息,是根据当前单词自身的特点统计获得;识别信息由字母分类器给出,包括候选识别结果及其相应的置信度;语义信息用基于字母的语言模型进行描述。维吾尔文字作为一种在新疆少数民族地区和中亚部分地区流行的语言文字,研究其手写文字识别方法对促进少数民族地区的信息化步伐,增进国际交流都是非常有益的。
中文关键词: 联机手写字母;字母切分;文字识别;切分与识别融合策略;维吾尔文
英文摘要: The main difficulty in Uyghur handwritten word recognition is that the basic characters are not precisely segmented, and there are still lots of conjoined sections which can not be directly recognized. This project research an effective approach for online handwritten Uyghur word recognition based on the analysis of the unique shapes and writing styles of Uyghur words. Use of the integration of recognition-segmentation method, the words segment into conjoined sections, and over-segmentation is applied to further segment the conjoined sections into the basic unit of sequences, and merging them to obtain a segmentation candidate grid; the optimal segmentation and recognition result is achieved by fusion of geometric analysis, isolated character classifier and semantic information all-together. The beam search algorithms and dynamic programming algorithm is used for optimal path search in the word recognition process. The geometric information is estimated on current words to adapt to various writing styles of words, it includes unitary and binary geometric information; Recognition information is given by the character classifier with candidate results and their confidence; Semantic information is described by a character based model. The Uyghur language is widely used among the ethnic minorities in Xinjiang and t
英文关键词: Online Handwritten Characters;Characters Segmentation;Recognition;information fusion;Uyghur Scripts