项目名称: 基于主题模型的枢轴语言统计机器翻译研究
项目编号: No.61303082
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 苏劲松
作者单位: 厦门大学
项目金额: 27万元
中文摘要: 枢轴语言方法能够克服统计机器翻译缺乏双语语料的困境,成为近年来机器翻译研究和产业化的热点之一。然而,由于语言的多样性和稀疏性,目前的枢轴语言建模方法无法充分利用枢轴语言翻译单元的上下文信息,对最终模型产生负面影响。对此,本项目提出引入主题模型来建立上下文相关的枢轴语言统计机器翻译。项目主要工作包括:① 研究基于主题模型的上下文表示方法,克服传统方法存在的缺陷,满足枢轴语言统计机器翻译建模的需求;② 在基于主题模型的上下文表现形式下,研究引入枢轴语言上下文的词语对齐建模新方法;③ 在基于主题模型的上下文表现形式下,研究引入枢轴语言上下文的翻译模型建模新方法。项目充分发挥了主题模型的优势,推动枢轴语言统计机器翻译由上下文无关建模发展为上下文相关建模。项目的开展将为如何更好地利用枢轴语言方法来解决训练资源缺乏问题提供一种新思路,对于资源贫乏语言的机器翻译具有重要意义。
中文关键词: 统计机器翻译;枢轴语言;主题模型;;
英文摘要: Pivot language approach for statistical machine translation (SMT), which is able to break through the bottleneck in parallel corpus, has become a hotspot in machine translation research and applications. However, because of the diversity and sparsity in language, the pivot-side context information is far from fully utilized in the implementation of conventional pivot language approaches, and this results in negative effects on final models. In this project, we propose to introduce topic model to establish context-aware pivot-based SMT. The research mainly includes the following three aspects: ① We focus on how to represent the context with topic model information, which is able to overcome the defect of the conventional approaches and meet the modeling needs in pivot-based SMT. ② Based on the above representation, we propose a word alignment model with the topic-based context in the pivot side. ③ Based on the above representation, we propose a translation model with the topic-based context in the pivot side. Taking advantage of topic model, our project promotes pivot-based SMT from context-free modeling to context-sensitive modeling. The implementation of our project provides new insight into breaking down the resource barrier using pivot language approaches, thus it has important theoretical and practical signi
英文关键词: statistical machine translation;pivot language;topic model;;