项目名称: 基于概率化SC文法的多策略机器翻译研究
项目编号: No.61201351
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 电子学与信息系统
项目作者: 冯冲
作者单位: 北京理工大学
项目金额: 24万元
中文摘要: 机器翻译是一个多学科交叉的研究领域。目前,分别以理性主义和经验主义思想为指导的规则机器翻译(RBMT)和统计机器翻译(SMT)各自都取得了长足进展,但也都存在着一些固有问题。本项目着眼于二者之长,以较成熟的RBMT引擎、大规模语料库为依托,首先从对SC文法的概率化扩展入手,研究并提出SC文法的概率化扩展模型和对它的参数估计算法;然后,进一步研究如何通过基于树到串模型的SMT方法来自动校正RBMT引擎的错误,特别是如何借助于SC文法概率化扩展来实现SMT对译文质量的综合优化;接下来继续深入研究,设计并分析不同的多策略机器翻译模式,探索如何综合运用统计翻译或规则翻译的处理技术来取得更为优化的译文。本项目力图通过尝试对传统理论的创新扩展和对不同方法的综合运用,实现机器翻译译文质量的改进。同时,本项目的努力,对单独使用规则或统计方法的研究,特别是如何回避、克服的各方法的不足,也都将有所借鉴。
中文关键词: 统计机器翻译;多策略机器翻译;系统融合;翻译模型;
英文摘要: Machine Translation (MT) is regarded as a difficult joint research topic. Currently, Rule-based Machine Translation (RBMT) and Statistical Machine Translation (SMT), which respectively embody rationalism and empiricism, have all obtained remarkable achievements and all faced their own challenges. Our proposal, which focused on the merit of both methods, is based on the full-fledged RBMT engine and large scale corpus. First, the probabilistic extension of Sub Category Grammar will be studyed. We will propose a probabilistic extension model of SC Grammar and its parameter estimation algorithm. Then, we will consider how to correct the errors in output of RBMT using SMT engine based on tree to string SMT model, especially on the application of probabilistic extension of Sub Category Grammar. Several hybrid MT schemas will be furtherly designed and analyzed. We wish the quality of translation could be improved by hybrid of RBMT and SMT methods. In summary, this project try to improve current MT techonlogy by creative extentsion of traditional methods and compositive application of distinct algorithms. We believe our study will also be valuable for MT researchs on individual rule-based or statistical methods.
英文关键词: SMT;Hybrid Machine Translation;MT Combination;Translation Model;