项目名称: 脱机手写藏文字符识别研究
项目编号: No.61462072
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 计算机科学学科
项目作者: 黄鹤鸣
作者单位: 青海师范大学
项目金额: 47万元
中文摘要: 脱机手写字符识别输入是机器自动识别输入的一种重要方式,能克服人工键盘输入的固有缺陷,将成为计算机输入的主流,但通过专业机构的检索发现:几乎无人从事脱机手写藏文字符识别的研究工作,因此,项目组拟对这一课题展开研究。首先,进一步完善已有的脱机手写藏文字符样本数据库;其次,在预处理阶段,提出符合藏文字符特点的字符倾斜角度归一方法和尺寸归一方法;第三,提出基于稀疏表示和核主成分分析的藏文字符特征提取方法,提高系统对藏文字符的分类性能;第四,利用级联了K-NN和稀疏表示的两阶段分类器对藏文字符进行分类,解决藏文字符类别数过多和每类样本规模过大的问题;最后,根据藏文音节中各个字符间的语法制约关系建立语言模型进行识别后处理,从而进一步提高字符识别率。本项目的研究成功将对丰富文字识别理论、推动藏语言文字信息化、促进藏族地区科技发展、培养藏文信息处理领域科研骨干具有重要意义。
中文关键词: 脱机;手写;藏文;字符;识别
英文摘要: It is a trend to put handwritten text into computer system automatically, and offline handwritten character recogniton is an important way to realize it. But, up to the present, there is little researches about off-line handwritten Tibetan character recognition both at home and abroad. Therefore, this project team devote itself to this challaging project. Firstly, the project team will further complete the sample database of off-line handwritten Tibetan characters. Secondly, in pre-processing stage, based on the characteristics of Tibetan character, the project team proposes a slant correction method and a size normalization method. Thirdly, the team proposes to extract the features of Tibetan character with such methods as sparse representation, kernel transform, and Zernike moments. Fourthly, in classification stage, the cascaded multiple classifier is used to deal with the problem of large class number of Tibetan characters. And finally, the project team use the restrains between letters of a Tibetan syllable to further improve the recognition rate of the proposed off-line handwritten Tibetan character recognition system. The study of this project will benefit the theory development of character recognition, the researchers training of this field, the informatization of Tibetan script, and the development of the science and technology of Tibetan area.
英文关键词: off-line;handwritten;Tibetan;character;recognition