项目名称: 汉语词法与句法结构的统一分析
项目编号: No.61202162
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 李中国
作者单位: 苏州大学
项目金额: 23万元
中文摘要: 汉语中词法与句法的界限比较模糊,导致在分词、词性标注及句法分析等汉语处理的基础环节中均遇到性能瓶颈。本项目围绕汉语这一特点,实现词法与句法结构的统一分析,打破词法与句法在汉语自动分析技术中的人为分界。为此,本项目将深入考察汉语构词特点,研究词的内部结构体系以及词内部结构与短语结构的关系,制定完善的词语结构标注规范,并据此在已有树库上标注规模为6至8万词的结构,得到词法与句法结构一体化标注的树库。在此基础上,从成分分析与依存分析角度设计词法和句法结构的统一分析模型及相应分析算法,使得对于给定未分词的汉语句子,系统输出结果同时包含词法与句法结构。本项目所研究的词法与句法统一分析,不仅可以为中文信息处理系统提供便于使用、涵盖各种语言粒度、高效准确的词法和句法分析结果,而且还可以通过计算、建模手段,加深我们对汉语的理性认识,因此,实施本项目将具有工程实践和科学探索两方面的意义。
中文关键词: 词法分析;句法分析;统一分析模型;词法结构体系;
英文摘要: There is no clearly defined boundary between morphology and syntax in Chinese. This issue has led to serious performance bottlenecks in areas such as Chinese word segmentation, part-of-speech tagging and syntactic parsing of Chinese. This project aims to design a unified parsing model and algorithm for analyzing Chinese morphological and syntactic structures, thus removing the somewhat artificial boundary between words and phrases in Chinese. To achieve this goal, we will investigate systematically the framework of internal structures of words in Chinese, and come up with an annotation standard for annotating word structures. Based on this standard, we will annotate structures of about 60 to 80 thousand words of an existing Chinese treebank. Then we will design an effective unified parsing model both in constituent analysis framework and in dependency parsing framework, plus an efficient algorithm for parsing unsegmented Chinese sentences into their corresponding morphological and syntactic structures. The unified parsing framework of this project can not only provides more easy-to-use results of morphological and syntactic analysis, but also provides us with a unique opportunity for investigating the Chinese language through means of modeling and computing. Thus the success of this project will benefit both Chi
英文关键词: morphological analysis;syntactic parsing;unified parsing model;morphological structure framework;