基于规则学习汉语语义构词研究

项目名称： 基于规则学习汉语语义构词研究

项目编号： No.61272215

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 自动化技术、计算机技术

项目作者： 亢世勇

作者单位： 鲁东大学

项目金额： 72万元

中文摘要： 在自然语言理解和机器翻译系统中，未登录词的识别和理解一直是难以突破的"瓶颈"问题，尽管学者们经过半个多世纪的努力在语义构词方面取得了一定的成果，但对于该问题的解决并没有取得突破性的进展，其中一个重要原因就是缺乏详尽可靠的语义构词规则。本课题的研究目标就是采用数据挖掘和机器学习技术，通过人机互动，总结语义构词规则并将它运用到未登录词的语义理解中。研究内容主要包括：（1）校对并扩充已建成的《汉语语义构词数据库》，并抽取一定的训练集；（2）利用数据挖掘技术提取语义构词规则，采用人工干预确保规则的准确性；（3）将这些规则运用到验证集中，通过反复调试得到最终的语义构词规则集；（4）将规则应用到未登录词的预测和理解。本课题的研究成果从理论上说可以推动汉语词汇语义学的发展，丰富和完善汉语词汇语义学理论；从实践上来看有利于推动计算语言学尤其是自然语言理解和机器翻译的进程，也有助于对外汉语教学实践。

中文关键词： 未登录词；语义构词规则；数据挖掘与机器学习；分类器；预测分析

英文摘要： The recgonition and understanding of out of vocabulary (oov) words is a "bottleneck" problem in natrual language understanding and machine learning. The community made some achivements in semantic construction in past decades, but oov problem still stand which we believe is mainly caused by the shortage of such knowledge base with solid semantic construction rules. This project is aim to retrive such rules and apply to understand the oov words by employing data mining and machine learning technology. We will work on following problems: (1) proofread and expand the "Chinese Semantic Construction Database", from which we generate training data. (2) abstract semantic construction rules using data mining technology. We will employ annotators to check the results to garanttee that the results are accurate. (3) test the semantic construction rules in verification experiment, from which the semantic construction dataset is generated. (4) apply the dataset in oov predictation and understanding. The achivements of this projects makes contribution in Chinese lexical semantics, enhances the theory research work in Chinese lexical semantics; secondly, it can be an important resources for computational linguistics, especially for natrual languge understianding and machine learning; finally, we believe the community of teachi

英文关键词： Unregistered words；Semantic word formation rules；Data mining and machine learning；Classifier；Predictive analysis

成为VIP会员查看完整内容