项目名称: 基于组合范畴语法的汉语深层句法分析
项目编号: No.61300064
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 孙薇薇
作者单位: 北京大学
项目金额: 23万元
中文摘要: 深层句法分析旨在获取比传统的短语结构分析和依存分析更为深层的语法信息,并提供通向组合语义分析的透明接口,是近些年兴起的一个重要研究课题。本项目提出基于组合范畴语法来研究汉语深层句法分析,力图在范畴语法和汉语句法分析两方面取得创新性成果和研究性进展,为汉语的语义理解等深层文本分析任务提供支撑。为实现这一目标,我们将着重研究基于深层词汇计算的词法消歧和语义驱动的句法消歧等两项深层句法分析的核心技术,构建汉语深层句法分析器。在此基础上,将进一步研究辨别式与隐变量生成模型的集成学习,异质数据融合以及无指导词汇归纳等三个统计机器学习问题,藉此从学习算法和拓展数据源两个方面来改进深层句法分析。本项目的最终目标是探索汉语深层句法分析问题、研究相关核心技术并构建高质量的语言理解系统,从而为文本数据挖掘、问答系统、机器翻译等研究领域提供有益参考。
中文关键词: 组合范畴语法;深层依存分析;基于状态转换的句法分析;基于因子分解的句法分析;增量式句法分析
英文摘要: Compared to shallow phrase-structure and dependency parsing, deep parsing can provide more detailed syntactic information and better integrated interface for composition semantics. It has drawn more and more attention in the past several years. This proposal is concerned with Chinese deep parsing based on Combinatory Categorial Grammars (CCG). The goal is to develop better deep parsing techniques, especially for the Chinese language. First, we will study (1) deep lexical processing techniques for lexical disambiguation and (2) semantics-driven models for syntactic disambiguation, which are the core modules of a deep parser. In addition, we will study (1) hybrid discriminative and symbol-refined generative learing, (2) heterogeneous treebank ensemble and (3) unsupervised lexical acquisition. These advanced statistical machine learning techniques can be applied to enhance deep parsers as well as many other NLP systems. We propose to study both linguistic and computational problems in deep parsing, and to build high-quality language understanding systems for Chinese.Our research will benefit research on text mining, question answering, machine translation, just to name a few.
英文关键词: Combinatory Categorial Grammar;Deep Dependency Parsing;Transition-based Parsing;Factorization-based Parsing;Incremental Parsing