项目名称: 基于汉英双向树串模型的统计机器翻译研究
项目编号: No.60872118
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 金属学与金属工艺
项目作者: 孙广范
作者单位: 中国电子信息产业发展研究院
项目金额: 29万元
中文摘要: 本项目研究基于汉英双向树串模型的统计机器翻译。重点研究内容包括:(1)基于双向树串模型的短语获取模型,目的在于解决基于单向树-串或串-树模型的遗漏有用短语的问题;本课题拟利用汉英双向树串模型中的汉英串-树对应关系来利用英语句法树知识帮助获取遗漏的短语。(2)进行解码前对于复杂结构句型的大结构分析和调序模型,目的在于找到可以提高基于统计机器翻译的长距离调序问题的解决方法。(3)利用双向树串模型获得有向短语以及基于有向短语的解码算法,目的是有效融合有向短语与树串对齐模板,提高解码器的效率和效果。通过本课题的研究,将为困扰基于句法的统计机器翻译的非句法成分短语翻译问题和为基于短语的统计机器翻译方法的长距离调序问题的解决提供一种可行的解决方案,从而推动基于句法的统计机器研究的发展。
中文关键词: 统计机器翻译; 基于句法的统计机器翻译; 双向树串模型
英文摘要: The project aims at the research of the statistical machine translation based on Chinese-English bi-directional tree-string model. The project focuses on: (1) phrase acquisition model based on bi-directioanl tree-string model. It aims to address the problem of missing useful phrases caused by uni-directional tree-string model or string-tree model. The project aims to acquire the missed useful phrases by the use of corresponding relations of Chinese-English string-tree and the knowledge of English syntactic tree. (2) The analysis and reordering model of complex structures before decoding. It aims to find the way of sovling the problem of long distance reordering in statistical machine translation. (3) The acquisition of directional phrases and algorithms based on directional phrases. It aims to integrate directional phrases and tree-string alignment templates, and improve the performance of decoder. The project will provide a feasible solution for the translation of non-syntactic constituents in syntax-based SMT and long distance reordering in phrase-based SMT, and promote the development of syntax-based SMT.
英文关键词: stastical machine translation; SMT based on syntactics; bi-directional tree-string model