项目名称: 基于弱指导机器学习技术的中文领域本体非分类关系自动学习研究
项目编号: No.61300120
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 仇晶
作者单位: 河北科技大学
项目金额: 23万元
中文摘要: 以中文领域本体自动构建过程为背景,研究本体中非分类关系的自动学习问题。提出了一种基于统计分析和依存语言模型相结合的非分类关系识别方法,通过向统计模型中加入语义信息,研究并实现非分类关系的高性能识别。提出了一种基于最短依存路径的非分类关系标注方法,以概念间最近公共父节点为依据确定概念间最短依存路径,并捕获概念间中心动词,研究并实现非分类关系的有效标注。提出了一种基于弱指导机器学习算法的非分类关系实例化方法,通过选择合适的词汇、语法及语义特征,设计合理的句法核函数及组合核函数表示形式,以Bootstrapping算法作为弱指导机器学习方法,研究并实现非分类关系实例化。研究成果能够在中文领域本体学习技术等方面有所突破,对提高中文领域本体构建的自动化程度,增强中文领域本体学习方法的适应性和鲁棒性,扩大本体在中文范围内的应用有重要意义。
中文关键词: 本体学习;非分类关系抽取;关系标注;信息抽取;
英文摘要: With automatic construction process of Chinese domian ontology acts as the background, this project researches on the automatic learning non-taxonomic relationships. A new method of non-taxonomic relation recognition is proposed based on the combination of statistical analysis and dependency language models. Semantic information is added to the statistical model to improve the performance of the non-taxonomic relation recognition. A novel method of non-taxonomic relation labeling is proposed based on the shortest dependency path between two concepts.The nearest public parent node is found to help to determine the shortest path, and also help to captur the verbs which can be used to labeling the relaitonships effectively. A new method of extraction instances for non-taxonomic relation is proposed based on weakly-supervised machine learning algorithm. Suitable syntactic kernel functions are designed to finely express lexical syntax and semantic features. Bootstrapping algorithm is chosen to extract instances for non-taxonomic relation. The achievment in the research will be make a breakthrough in the technologies of Chinese domain ontology automatic learning; improve the degree of Chinese domain ontology automatic construction, and the adaptability and robustness of the ontology learning method; expand the applica
英文关键词: Ontology Learning;non-taxonomic relation extraction;relation labeling;information extraction;