项目名称: 面向专利文献的统计机器翻译语境分析
项目编号: No.61303152
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 何彦青
作者单位: 中国科学技术信息研究所
项目金额: 22万元
中文摘要: 直至目前,面向专利文献的统计机器翻译系统尚不能满足文献翻译的实际需要,它未能提供一种切实可行的长句翻译策略,也无法利用上下文语境来实现篇章的翻译。因此它大多作为人工翻译的辅助工具或与规则系统融合使用。本研究尝试着将专利文献的长句分析和上下文语境分析有机地结合起来,提出具备自动专利语境分析功能的统计机器翻译方法。该方法针对专利文献机器翻译中的关键问题与技术难点,分别提出"高精度的专利文献的长句分析方法"用于进行专利文本的复杂长句简化、"适中语义粒度的专利文献上下文语境分析功能"用以加深机器翻译系统对句子乃至篇章的语义理解以及"基于专利语境的统计机器翻译模型"以生成目标翻译。该项研究中关键技术的攻克,将极大地提高机器翻译系统对于专利文献的语境自动分析能力,获得准确率更高的统计机器翻译系统,这不仅在机器翻译领域具有重要的理论创新意义,而且在专利文献处理中具有重要的应用价值。
中文关键词: 专利文献;语境分析;统计机器翻译;概念信息体;
英文摘要: Statistical machine translation system for patent texts still can not really meet the quality requirements of the actual translation. It can neither provide a reliable translation strategy for long sentences in patent texts, nor automatically employ the context of the source texts to translate patent discourse. So it is combined with rule-based machine translation or plays a role of a human translation aids. The application will organically combine the analysis of long sentences together with context analysis and propose automatic contextual analysis of statistical machine translation. In order to solve the key problems for the patent texts, the focuses in this research are listed as follows: 1)High precision of Long sentence analysis for patent texts to simplify the complex long sentence; 2)Context analysis based on appropriate semantic unit for patent texts to improve the semantic understanding ability of machine translation system; 3) Context based statistical machine translation for patent texts to obtain target translation. The overcome of key technologies in the application will greatly enhance the ability of the automatic context analysis of machine translation systems for patent texts and provide statistical machine translation system with higher accuracy, which not only has important theoretical innov
英文关键词: patent texts;context analysis;statiatical machine translation;conceptual information units;