汉语词法与句法结构的统一分析

项目名称： 汉语词法与句法结构的统一分析

项目编号： No.61202162

项目类型： 青年科学基金项目

立项/批准年度： 2013

项目学科： 计算机科学学科

项目作者： 李中国

作者单位： 苏州大学

项目金额： 23万元

中文摘要： 汉语中词法与句法的界限比较模糊，导致在分词、词性标注及句法分析等汉语处理的基础环节中均遇到性能瓶颈。本项目围绕汉语这一特点，实现词法与句法结构的统一分析，打破词法与句法在汉语自动分析技术中的人为分界。为此，本项目将深入考察汉语构词特点，研究词的内部结构体系以及词内部结构与短语结构的关系，制定完善的词语结构标注规范，并据此在已有树库上标注规模为6至8万词的结构，得到词法与句法结构一体化标注的树库。在此基础上，从成分分析与依存分析角度设计词法和句法结构的统一分析模型及相应分析算法，使得对于给定未分词的汉语句子，系统输出结果同时包含词法与句法结构。本项目所研究的词法与句法统一分析，不仅可以为中文信息处理系统提供便于使用、涵盖各种语言粒度、高效准确的词法和句法分析结果，而且还可以通过计算、建模手段，加深我们对汉语的理性认识，因此，实施本项目将具有工程实践和科学探索两方面的意义。

中文关键词： 词法分析；句法分析；统一分析模型；词法结构体系；

英文摘要： There is no clearly defined boundary between morphology and syntax in Chinese. This issue has led to serious performance bottlenecks in areas such as Chinese word segmentation, part-of-speech tagging and syntactic parsing of Chinese. This project aims to design a unified parsing model and algorithm for analyzing Chinese morphological and syntactic structures, thus removing the somewhat artificial boundary between words and phrases in Chinese. To achieve this goal, we will investigate systematically the framework of internal structures of words in Chinese, and come up with an annotation standard for annotating word structures. Based on this standard, we will annotate structures of about 60 to 80 thousand words of an existing Chinese treebank. Then we will design an effective unified parsing model both in constituent analysis framework and in dependency parsing framework, plus an efficient algorithm for parsing unsegmented Chinese sentences into their corresponding morphological and syntactic structures. The unified parsing framework of this project can not only provides more easy-to-use results of morphological and syntactic analysis, but also provides us with a unique opportunity for investigating the Chinese language through means of modeling and computing. Thus the success of this project will benefit both Chi

英文关键词： morphological analysis；syntactic parsing；unified parsing model；morphological structure framework；

成为VIP会员查看完整内容

相关内容

词法分析

关注 204

词法分析（英语：lexical analysis）是计算机科学中将字符序列转换为单词（Token）序列的过程。词法分析（lexical analysis）包括汉语分词和词性标注两部分。和大部分西方语言不同，汉语书面语词语之间没有明显的空格标记，文本中的句子以字串的形式出现。因此汉语自然语言处理的首要工作就是要将输入的字串切分为单独的词语，然后在此基础上进行其他更高级的分析，这一步骤称为分词（word segmentation 或tokenization）。除了分词，词性标注也通常认为是词法分析的一部分。给定一个切好词的句子，词性标注的目的是为每一个词赋予一个类别，这个类别称为词性标记（part-of-speech tag），比如，名词（noun）、动词（verb）、形容词（adjective）等。

【Chen Guanyi博士论文】汉语名词短语的计算生成，282页pdf

专知会员服务

26+阅读 · 2022年4月14日

特约专栏丨孙茂松教授——自然语言处理一瞥：知往鉴今瞻未来

专知会员服务

25+阅读 · 2022年3月13日

【ACL2021】Hi-Transformer：一种具有层次化和交互式特点的长文档建模结构

专知会员服务

13+阅读 · 2021年8月4日

【MIT】语言的神经结构:整合建模集中于预测处理，42页ppt

专知会员服务

10+阅读 · 2021年6月26日