项目名称: 基于主干成分的句法统计机器翻译模型研究
项目编号: No.61300097
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 肖桐
作者单位: 东北大学
项目金额: 28万元
中文摘要: 统计机器翻译是当今自然语言处理领域的重要研究课题之一。虽然近些年来已经有一些成功的统计机器翻译模型被提出,如何更加充分的利用(源语言)句子的结构信息及句子主干信息来进一步提高翻译性能仍是十分重要且有待研究的科学问题。本课题研究基于主干成分的句法统计机器翻译及相关科学问题,内容涉及中文句子主干成分自动识别、基于主干成分的句法统计机器翻译建模、基于主干成分的句法统计机器翻译模型训练和解码等内容。本课题将以数据驱动的方法为指导,结合人们在翻译过程中形成的先验知识构建整个机器翻译框架。课题的选题及实施依托于申请人所在团队(东北大学自然语言处理实验室)在机器翻译方面研究的多年积累,课题的研究成果将全部集成到开源统计机器翻译系统NiuTrans中,无偿为学术界共享使用。
中文关键词: 机器翻译;句子主干;句法模型;解码;模型训练
英文摘要: Statistical Machine Translaiton (SMT) is one of the most important sub-fields in Natural Langauge Processing (NLP). While several methods have been succesfully developed in recent years, it is worth investigating new models that make better use of structures in (source-language) sentences as well as the skeleton information encoded in translation. In this proposal we study the skeleton-based model for syntactic statistical machine translation. The problems we address include automatic identification of chinese skeleton, the skeleton-based syntactic statistical translation model, training and decoding for skeleton-based statistical machine translation. The proposed methods/models make benefits from data-driven methods and the prior knowledge in real-world translation. This work is inspired and supported by the previous work of our group (Natural Language Processing Lab, Northeastern University). All the techniques developed in the project will be integrated into the NiuTrans open-source statistical machine translation system, which will be released to public under the support of this project.
英文关键词: Machine Translation;Sentence Sekeleton;Syntax-based Model;Decoding;Model Training