项目名称: 基于部件的联机手写藏文音节识别方法研究
项目编号: No.61202220
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 马龙龙
作者单位: 中国科学院软件研究所
项目金额: 23万元
中文摘要: 针对藏族地区信息处理技术的重要性,以及当前联机手写藏文识别技术尚不能完全解决支持连续书写的藏文手写输入的问题,本项目以联机手写藏文音节识别为研究对象,通过分析藏文音节的结构特点,以部件为识别基元,结合部件统计识别方法和基于部件的结构识别方法的优点,提出一种基于部件的联机手写藏文音节识别框架。首先,研究基于部件的藏文音节的切分算法,以解决字丁/部件之间粘连和重叠的问题;其次,研究音节识别框架中需要集成的四个子模型(部件分类模型、基于字丁的语言模型、字丁-部件生成模型和几何模型)的构建;最后,基于音节过切分的结果,利用集成切分与识别的思想,将这四个子模型集成到统一的识别框架下,研究多个子模型的信息融合和参数学习方法,根据最大后验准则对切分和识别进行评价,最终得到音节的切分和识别结果。该研究成果中的关键技术可以应用到基于笔式交互的移动设备中,并为联机手写藏文文档的分析与识别奠定研究基础。
中文关键词: 部件;音节;半自动;规则;联机手写藏文音节识别
英文摘要: Tibetan information processing technologies play an important role in Tibetan areas. However, because of the limitation of existing on-line handwritten Tibetan recognition algorithms, the performance of continuous handwritten tibetan input method isn't satisfying. We propose an on-line handwritten Tibetan syllable recognition framework based on Tibetan components by analyzing the structure characteristic of Tibetan syllable. The component-based recognition framework selects components as recognition units and combines the advantage of statistical component recognition methods with component-based structural recognition methods. Firstly, A component-based Tibetan syllable segmentation algorithm is presented to solve the stroke connection and serious overlap between characters or components. Secondly, four submodels of integrated syllable recognition framework, that is, component classification model, character-based language model, character-component generation model and geometrical model, are built. Finally, based on syllable over-segmentation results, we adopt integrated segmentation and recognition strategy to integrate these four submodels into a principled recognition framework. We study the algorithms of information fusion and parameter learning for integrating multiple models. The optimal syllable segme
英文关键词: component;syllable;semi-automatic;rule;on-line handwritten Tibetan syllable recognition