项目名称: 开放域动态事实性信息获取及融合方法研究
项目编号: No.61273321
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 秦兵
作者单位: 哈尔滨工业大学
项目金额: 81万元
中文摘要: 开放域动态事实性信息获取是指从互联网任意领域的网页或文档中抽取动态事件信息,并以结构化的呈现出来。已有方法大多局限于特定领域的研究。本课题提出的面向开放域的动态事实性信息获取与融合方法研究,通过研究面向开放域的事件触发词自动发现,以及开放域实体及其类型识别的方法,解决了研究领域受限问题;尤其是与以往的事件元素识别方法不同的是,本课题提出了基于超图的事件元素识别方法,该方法不仅考虑了触发词和候选事件元素之间的二元语言特征,也考察了候选事件元素之间的多元语言特征,从而融入了更多的语言特征来进行候选事件元素的识别。最后,通过将大量文本中抽取的事实性信息融合,识别和去除冗余信息,构建了事实性信息知识库。开放域动态事实性信息的抽取及融合研究对于帮助用户从互联网文本中挖掘高效获取有用信息,以及为知识推理和问答系统等更高层的自言语言应用提供支持,具有十分重要的意义。
中文关键词: 开放域动态事实性信息;动态事件触发词识别;实体及实体类型识别;上下位关系识别;事实性信息融合
英文摘要: Dynamic Factual Information Extraction in open-domain refers to extract the factual event information from the web or document in any domain, and further illustrate by structural form. Previous methods are mostly confined in the specific domain research. We propose the method for information extraction and fusion oriented to open-domain, by research on automatically detecting the event trigger in open-domain and recognizing the open-domain entity and entity type we break the domain limit; what especially different from the previous argument recognition methods is the proposed method of event argument recognition abased on hypergraph, this method not only takes the binary features between event trigger and event candidate arguments into consideration, but also take the features between event candidate arguments into account, so that it merges more language features to the recognition of event candidate. At last, by fusing the factual information extracted from the huge collection of documents, recognizing and distilling the redundant information, we construct the factual information knowledge base. Research on dynamic factual information acquisition and fusion in open-domain is of great significance to help the user acquire the useful information efficiently and provide support to higher natural language applicat
英文关键词: Open-domain dynamic factal infornmation;Dynamic event trigger recognation;Entity and type recognation;hyponym-hypernym relation recognition;Factal information fusion