项目名称: 面向中文指称概念的知识获取方法研究
项目编号: No.61203284
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 自动化学科
项目作者: 王石
作者单位: 中国科学院计算技术研究所
项目金额: 25万元
中文摘要: 指称概念内部蕴含着丰富的知识,由于这些知识具有常识性和高压缩性,基于语料库的方法在获取时遇到了困难。本课题在已构建的大规模中文指称概念库和上下位知识库基础上,研究从指称概念内部获取语义关系的新方法,是对大规模知识库建设的有益补充。首先,结合上下位知识库和语料库度量概念间的语义结构相似性,监督学习指定类型的语义关系,并用语义规则进行知识验证。其次,采用软层次聚类方法,非监督学习指称概念内未指定类型的语义关系,并用词汇-句法模式方法自动分析其元性质,辅助人工关系命名并最终建立层次性的语义关系分类体系。最后,借助于中文指称概念的构词规律,利用后缀频率统计特征和语义验证规则,从指称概念中获取后缀型上下位关系,并基于隐喻词汇和语境特征识别和排除隐喻型上下位。在研究上,本课题可为自然语言深层语义分析这一难题提供思路。在应用中,本方法可结合面向语料库的知识获取系统,建设大规模知识库,为智能系统提供资源。
中文关键词: 指称概念语义分析;隐喻发现;知识获取;自然语言理解;
英文摘要: Knowledge contained within nominal concepts, which is great in quantity, is hard to be discovered using traditional corpus-based approaches because of their commonsense and high-compression properties. On the basis of obtained large-scale Chinese nominal concepts and hypernym knowledge base, this research focuses on the problem of mining semantic relations from nominal concepts, and aims at completing existed ontology. Firstly, semantic structure similarities of nominal concepts, which are measured mainly using large-scale hypernym base, are introduced to train classifies to detect given semantic relations from nominal concepts. In order to ensure the accuracy, error-driven automatically learned semantic rules are adopted for knowledge validating. Secondly, soft hierarchical clustering algorithms are used to discover undefined semantic relations. Lexico-syntactic patterns are automatically learned for these relations to discover their meta properties, in order to help us to define the relations and finally construct semantic relations categories. Finally, taking advantage of forming rule of Chinese nominal concepts, we use suffix frequency statistics to extract candidate hypernym relations from nominal concepts, and adopt semantic rules to validate them. Unaccepted metaphor nominal concepts are recognized based
英文关键词: nominal concept semantic analysis;metaphor detection;knowledge discovery;natural language processing;