项目名称: 基于树的句法翻译模型关键技术研究
项目编号: No.61272376
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 朱靖波
作者单位: 东北大学
项目金额: 81万元
中文摘要: 统计机器翻译核心思想是给每个潜在的翻译结果都赋予一定的概率,并选择概率最大的翻译作为最终的翻译结果。统计机器翻译的研究和系统开发已经成为自然语言处理乃至整个人工智能领域的核心问题之一,已经被广泛地应用在在线翻译和受限领域的机器辅助翻译中。本申请课题重点研究基于树的句法翻译模型(包括树到串和树到树模型)的一些关键问题,目的更好利用源语句法结构来改善句法翻译规则抽取和解码搜索技术,最终改善翻译性能。主要研究内容涉及到句法翻译规则抽取、模型训练、特征权重优化、解码搜索和目标语句法结构评价等关键技术,最后计划将集成相关研究成果到实验室研制的开源统计机器翻译系统NiuTrans中,与国内外同行们共享相关研究成果。
中文关键词: 机器翻译;句法分析;语义分析;机器学习;自然语言处理
英文摘要: Statistical machine translation (SMT) aims to assign each candidate translation a probability, and outputs the best translation with maximum probability. Currently SMT topics become one of key issues of the fields of natural language processing and even artificial intelligence. Also SMT techniques have been widely used for online translation and domain-limited aided translation applications. In this proposal, we mainly focus on some key issues of tree-based syntax translation models including tree-to-string and tree-to-tree models. Our goal is to learn better knowledge from source parse trees to help syntactic rule extraction and decoding techniques, which in turn improves machine translation performance. The main topics we study in this proposal involve syntactic translation rule extraction, model training, weight tuning, decoding and target tree structure evaluation etc. Finally, we will integrate these techniques into the NiuTrans that is an open-source SMT platform developed our group, and release the NiuTrans to SMT community.
英文关键词: Machine translation;syntactic parsing;semantic parsing;machine learning;natural language processing