项目名称: 自然语言处理中的覆盖域界定和聚焦点识别研究
项目编号: No.61272260
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 朱巧明
作者单位: 苏州大学
项目金额: 80万元
中文摘要: 覆盖域界定和聚焦点识别研究分别从作用面和作用点两个层面确定用户感兴趣的文本片断和关注对象,相互补充,相辅相成,在自然语言处理研究中具有广泛的应用价值,是实现句子级深层语义理解的重要基础之一。目前,覆盖域界定研究在建模和有效利用结构化句法信息的方面存在缺陷,聚焦点识别研究刚起步。本课题将在语言学理论指导下,从建模、结构化句法信息利用和数据不平衡问题研究等多个角度,深入研究自然语言处理中的覆盖域界定和聚焦点识别问题。主要研究内容包括:1)基于浅层语义分析的覆盖域界定模型;2)基于树核函数的覆盖域界定研究;3)基于竞争机制和中心理论的聚焦点识别研究;4)面向数据层面和算法层面的数据不平衡解决方案。同时,针对中文语料库缺乏问题,本课题将构建一定规模的高质量中文覆盖域界定和聚焦点识别语料库,深入开展中文覆盖域界定和聚焦点识别研究,缩短与英文相关研究的差距。
中文关键词: 否定;不确定;触发词检测;覆盖域界定;聚焦点识别
英文摘要: Scope determination deals with analyzing what part of a given sentence is under user's interest while focus identification further analyzes the specific object in which the user is mostly interested. As a fundamental issue in deep semantic parsing at sentence level, these two closely related and complementary tasks have many potential applications in natural language processing. The study of scope determination, however, currently focuses on chunking-based approaches and fails to effectively explore structured syntactic information while the research of focus identification just emerges. Within the guidance of linguistic theory, this project targets at the key issues of scope determination and focus identification from various aspects, such as computational modeling, exploring of structured syntactic information and managing of imbalanced data. The main content of this project includes: 1) a computational modeling framework for scope determination via shallow semantic parsing, 2) tree kernel-based scope determination, 3) focus identification using competition learning and centering theory, and 4) various solutions to imbalanced data from both data level and algorithm level. Last but not least, the project also aims to eliminate the performance gap between Chinese and English by constructing high-quality corpora
英文关键词: negation;speculation;cue detection;scope resolution;focus identification