项目名称: 汉语多层次语篇分析理论方法研究与应用
项目编号: No.61333018
项目类型: 重点项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 宗成庆
作者单位: 中国科学院自动化研究所
项目金额: 300万元
中文摘要: 建立在词汇、短语和句子级分析基础上的语篇分析是目前自然语言处理研究的核心问题之一。与英语的篇章理论与方法研究相比,关于汉语篇章级分析的理论方法研究相对滞后,在自然语言处理领域才刚刚起步。本项目将充分借鉴国内外已有的理论方法,针对汉语自身的特点和规律,建立一套适用于汉语篇章结构描述和语义分析的、可计算的理论体系,并将其应用于具体系统。主要研究内容包括:①提出汉语篇章结构关系分析、话题分析和衔接性、连贯性描述的多层次语篇分析的理论方法和模型;②基于所提出的理论模型,建立汉语篇章的多层次标注规范,并构建大规模汉语篇章标注语料库;③研究实现篇章分析的核心算法;④将篇章分析技术应用于机器翻译和问答系统。本研究工作对于丰富和发展计算语言学和中文信息处理研究,推动相关技术的发展,具有重要的科学意义和应用价值。
中文关键词: 篇章分析;机器翻译;问答系统;衔接性;连贯性
英文摘要: Based on the multi-level analysis of lexicon, phrase and sentence, discourse analysis has become one of the key issues in natural language processing research in recent years. However, Chinese discourse analysis is still in its very early stage, significantly lagging behind that of English in both theory and methodology. This project aims to establish the computational theory for the analysis of logical structure and semantics of Chinese discourse by leveraging on the state-of-the-art and apply the research results to practical applications empirically. In particular, the project focuses on the following researches: .1) propose the theory and model for the analysis of Chinese discourse logical structure, topic structure, cohesion and coherence; 2) based on the proposed theory, develop the annotation scheme and build up a large scale of Chinese discourse-annotated corpus; 3) study and implement the core algorithms of Chinese discourse analysis; 4) apply the research results to machine translation and question answering. .We believe that the research achievements from this proposal have great scientific significance and application value to Chinese information processing and Chinese computational linguistics by advancing the state-of-the-art and filling up the research gaps of automatic analysis and application of Chinese discourse.
英文关键词: Discourse Analysis;Machine Translation;Question Answering;Cohesion;Coherence