汉越双语语料库建设及词对齐方法研究

项目名称： 汉越双语语料库建设及词对齐方法研究

项目编号： No.61262041

项目类型： 地区科学基金项目

立项/批准年度： 2013

项目学科： 自动化技术、计算机技术

项目作者： 郭剑毅

作者单位： 昆明理工大学

项目金额： 43万元

中文摘要： 中越双语语言理解是加强中越两国文化交流的基础，汉语-越南语双语语料资源构建是汉语-越南语双语理解的必备资源。本课题针对越南语言特点，首先将研究越南语树库标记方法，构建越南语依存树库；其次，针对越南语句法特点，研究越南语依存关系识别方法，实现越南语依存句法分析器；然后，针对汉语-越南语双语语言句法特点，研究汉语-越南语双语词对齐方法；最后，研究汉语-越南语双语料选取、标注规范，构建汉语-越南语双语词语级对齐语料库，标注15万汉越句子词对齐语料库，并在此基础上开发实现越南语-汉语双语句子检索原型系统，解决汉语-越南语双语词语对齐语料库资源建设、依存句法分析、词对齐过程中的难点问题。项目研究成果将为汉语-越南语双语检索、双语机器翻译提供语料资源和技术支撑。

中文关键词： 越南语-汉语；依存树库；依存句法分析；双语词对齐方法；双语词对齐语料库

英文摘要： The understanding of Chinese-Vietnamese bilingual language is the basis for strengthening culture exchange between China and Vietnam, and the construction of Chinese-Vietnamese bilingual corpus is the essential resources to the understanding of Chinese-Vietnamese bilingual language. Firstly, for the Vietnamese language characteristics in this project, study on the Vietnamese treebank tagging methods to build the Vietnamese dependency treebank; Secondly, research on the Vietnamese dependency relationship identification methods based on Vietnamese syntactic features to achieve Vietnamese dependency parser; Furthermore, according to the syntactic features of Chinese-Vietnamese bilingual language, study the methods of Chinese-Vietnamese bilingual word alignment; Finally, reseasrch on the Chinese-Vietnamese bilingual materials selection and mark specification to construct the Chinese-Vietnamese bilingual word-level alignment corpus. On the basis of marking the 150000 Chinese and Vietnamese sentences and word alignment corpus, develop and implement the prototype system of Vietnamese-Chinese bilingual sentence retrieval to solve the difficulties occurred in the resource construction of Chinese-Vietnamese bilingual word alignment corpus, dependency parsing and the word alignment process. The research achievement of the

英文关键词： Vietnamese-Chinese；dependency treebank；dependency parsing；the methods of bilingual word alignment；bilingual word aligned corpus

成为VIP会员查看完整内容