项目名称: 手写中文文本识别的高扩展判别学习理论和方法
项目编号: No.61203260
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 自动化学科
项目作者: 苏统华
作者单位: 哈尔滨工业大学
项目金额: 24万元
中文摘要: 手写汉字识别是模式识别领域的重要分支,作为电子化中文文档的利器,对于整个国家具有战略意义。手写中文文本识别是最自然的汉字输入技术之一,但面临严峻的性能瓶颈。海量训练数据和判别学习都有助于性能的提升,却存在巨大的计算复杂度。本项目旨在从理论、算法和实现三个层面上设计基于分布式计算环境的高扩展判别学习方法。研究内容包括:1)建立当前最先进的基准识别系统,特别提出一种新颖的轻量级隐马尔可夫模型;2)研究分布式判别学习理论框架,支持对算法的收敛性、泛化界和复杂度的分析;3)研究生成式基准系统的分布式判别学习方法,在扩展当前计算模型的基础上,实现手写中文文本识别系统的分布式判别学习。本项目将建立具有鲜明特色的分布式判别学习的完整体系;提出针对大类别序列模式判别学习的一系列创新方法。最终解决手写中文文本识别的性能和效率双重瓶颈,并得到可推广到其它领域的重要成果。
中文关键词: 手写汉字识别;判别学习;GPU计算;原型学习;大类别序列标记
英文摘要: Handwritten Chinese character recognition is an important branch of pattern recognition field and is of strategic significance to national document transcription. As one of the most natural interfaces, however, handwritten text recognition faces great challenges due to its limited performance. Both using massive training data and discriminatively training the recognition models do help, if we can outlook the heavy burden in computation. This project studies the theory, algorithm, and implementation aspects of distributed discriminative learning for sequential labeling tasks. The issues undertaken include: 1) Proposing a novel light-weight hidden Markov model-based recognition system; 2) Establishing theoretical framework for distributed discriminative learning, and deriving the algorithms' convergence, generalization and complexity; 3) Developing more concrete computational models for baseline recognition systems, and deploying them using distributed discriminative learning. The expected main contributions of the project lie in: 1) Comprehensive theoretical guarantees for distributed discriminative learning; 2) Innovative methods for large-category pattern recognition using discriminative learning techniques. The outputs of the project may resolve both the performance bottleneck and the efficacy bottleneck, a
英文关键词: Handwritten Chinese Character Recognition;Discriminative Learning;GPU Computing;Prototype Learning;Large-category Sequential Labeling