项目名称: 面向开放领域的自动关系抽取技术研究
项目编号: No.60803078
项目类型: 青年科学基金项目
立项/批准年度: 2009
项目学科: 金属学与金属工艺
项目作者: 陈锦秀
作者单位: 厦门大学
项目金额: 20万元
中文摘要: 为了应对信息爆炸带来的挑战,迫切需要一些自动化的技术帮助人们在海量数据中迅速找到自己真正需要的信息。信息抽取技术在自然语言处理领域正越发地体现出它的重要性。本课题对信息抽取的关键支撑技术,即关系抽取任务,进行深入研究,指导计算机从自由文本中自动识别出实体之间的关系。当前国际上更多的是针对有监督学习的关系抽取技术的研究,这种技术通过训练样本的学习获得抽取模式,实现特定领域的关系抽取功能,因而需要对该知识领域较熟悉的人根据事先约定的规则来标记训练样本,同时需要足够数量的训练数据才能保证系统的抽取质量。为此,本课题积极探索面向开放领域的自动关系抽取技术,提出用多知识融合的手段来构建关系候选,建立基于图的关系抽取模型,并充分利用很容易获得的未标签样本的信息,在该图模型上进行非监督的学习,解决手工标注样本的困难,使其在各应用领域中都可扮演重要的角色,也为下一代基于自动问答的搜索引擎的发展奠定基础。
中文关键词: 信息抽取;关系抽取;多知识融合;图模型;非监督学习;
英文摘要: To challenge the information explosion, it cries for automatic techniques to help us discover useful information. Information extraction technique shows its importance more and more in the area of natural language processing. As a key subproblem of imformation extraction, relation extraction is the task of detecting and classifying relationships between two entities from text contents. To overcome the shortage of manually labeded data in supervised learning methods, our research aims to automate the process of relation extraction and investigates non-supervised learning resultions to rival supervised learning methods, so that we could resolve the problem of relation extraction with minimal human cost and towards open-domain automatic relation extraction. To realize this intention, we propose to construct domain-independent knowledge using a multi-information fusion technique, to represent each relation instance by extracting various lexical and syntactic features and present graph based models for non-supervised relation extraction task to overcome the limitations of the previous works.
英文关键词: information extraction; relation extraction; multi-information fusion; non-supervised learning; graph based model;