项目名称: 联机手写化学公式识别研究
项目编号: No.61301238
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 无线电电子学、电信技术
项目作者: 杨巨峰
作者单位: 南开大学
项目金额: 24万元
中文摘要: 作为一种通用的国际科学语言,化学公式应用广泛。但到目前为止,如何快速准确地识别联机输入的化学公式还是一个难题。本研究设计一种针对联机手写化学公式的自动识别、分析和理解体系。从优化求解角度考虑联机手写化学公式识别问题,选用SIFT、SURF等特征,结合传统的局部特征和全局特征,从多个角度反映化学公式的本质特性。利用多级CRF模型进行问题建模,使得识别结果能体现不同层次的公式特征。充分利用化学规则和规律辅助公式的理解过程,提出化学公式版面结构分析的算法,利用空间位置信息、时序信息和化学领域专有知识实现对化学公式的自动处理,同时解决各种书写异常问题。最终构建一个涵盖无机和有机化学领域的联机手写公式识别系统。
中文关键词: 深度学习;化学公式;图像处理;笔手势;
英文摘要: Chemical formulas and expressions are an essential means of communicating information and structure in domain of chemistry. Despite the ubiquity of formulas, there is still a large gap between how people naturally interact with formulas and how computers understand them today. Our goal is to develop an intelligent formulas understanding system that provides a more natural way to specify chemical structures to a computer. In our research, we will study a new recognition framework and apply it to online handwritten chemical formulas. The framework combines a hierarchy of visual features into a joint model using a discriminatively trained conditional random field. This joint model of appearance makes our framework less sensitive to noise and drawing variations, improving accuracy and robustness. The key research contributions of this research are: 1)A symbol recognition architecture that combines vision-based features at multiple levels of detail. 2) A discriminatively trained graphical model that unifies the predictions at each level and captures the relationships between symbols. 3) An energy function denoting the recognition results of chemical formulas which may typically avoid subjectivity and improve acurrency. 4) A real-time formulas recognition interface that will be evaluated by intended end-users and comp
英文关键词: deep learning;chemical formula;image processing;pen-based gesture;