项目名称: 汉英双语依存句法分析模型和算法研究
项目编号: No.61203314
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 自动化学科
项目作者: 陈文亮
作者单位: 苏州大学
项目金额: 24万元
中文摘要: 依存句法分析是自然语言处理的一个核心研究问题。已有研究主要基于单语文本,即输入单语句子,输出对应的依存树。由于大规模双语对齐文本的出现和依存句法驱动的统计机器翻译的应用需求,最近几年双语依存句法分析日益受到重视。本项目重点研究基于汉英双语对齐文本的依存句法分析,即汉英双语依存句法分析。主要研究内容包括:1)研究有指导的双语依存句法分析模型定义、解码算法和特征表示;2)研究基于大规模语料的特征表示和特征的领域自适应问题;3)探索双语依存句法分析和统计机器翻译的交互学习机制;4)集成上述研究成果,构造一个统一的基于图模型的汉英双语依存句法分析平台。当前汉英双语依存句法分析研究还处于起步阶段,本项目的开展将为汉语句法分析技术和机器翻译研究作出重要贡献,具有重要的研究价值和应用价值。
中文关键词: 依存分析;特征表示;句法分析;半监督学习;多语分析
英文摘要: Dependency parsing is one of the most important research topics in natural language processing. The previous studies for dependency parsing focus on parsing monolingual sentences. In recent years, bitext dependency parsing is getting more and more attention because there is a large amount of unlabeled bilingual sentences available for applications, such as dependency-based statistical machine translation (SMT). This project aims to improve bitext dependency parsing and build a platform for Chinese-English bitext dependency parsing. The main content includes: 1) proposing supervised bitext dependency parsing models; 2) exploiting features based on large-scale data; 3) building a learning framework for interaction between bitext dependency parsing and SMT; and 4) Integrating the above techniques to improve the performance of Chinese-English bitext dependency parsing and dependency-based SMT. The current research on bitext dependency parsing is in the very beginning stage. This project will make important contributions for Chinese syntax parsing and machine translation.
英文关键词: Dependency Parsing;Feature Representation;Syntax Parsing;Semi-supervised Learning;Multilingual Analysis