项目名称: 基于互联网的汉维科技术语提取技术研究
项目编号: No.61463048
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 其他
项目作者: 米尔夏提·力提甫
作者单位: 新疆大学
项目金额: 45万元
中文摘要: 术语(terms)集中承载特定领域的核心知识,术语自动抽取能够帮助人们便捷地获得和认识领域知识,而双语术语则充分体现了语言间的映射和对应关系,在自然语言处理中具有重要地位。本项在目前期预研的基础上,构建面向科技领域的汉维可比语料库,研究实用的基于可比语料的汉维双语术语抽取方法、汉维双语语料自动获取方法、维汉语料篇章级自动对齐方法,基于规则的维吾尔语术语识别以及抽取混合方法,研制基于互联网语料的汉维双语术语抽取原型系统,构建面向科技领域的汉语-维吾尔语双语新术语资源库,抽取和编纂科技领域的汉语-维吾尔语双语对齐新术语词典为汉维机器翻译、跨语言信息检索提供支持,促进新疆科技事业的发展和信息化建设进程。
中文关键词: 术语;可比语料库;双语对齐;汉语-维吾尔语
英文摘要: The concentration of terms carries the core knowledge of a particular field. Automatically extraction of terms can help people to access and understand the field of knowledge in a convenient and fast way. More over, bilingual terminology fully reflects the mapping and corresponding relations between the languages, and it plays an important role in the natural language processing. In this project, on the basis of pre-research, we will build science and technology-oriented Chinese-Uyghur comparable corpus to study practical method of comparable corpus based Chinese-Uyghur bilingual term extraction, method of Chinese-Uyghur Automatic corpus extraction, method of Chinese-Uyghur article level automatic alignment and hybrid approach of rule based Uyghur term detection and extraction. Develop Internet based Chinese-Uyghur extraction prototype system, build new term repository, extract and compile science and technology oriented Chinese-Uyghur bilingual new term dictionary to support Chinese-Uyghur machine translation, cross language information retrieval and advance the development of science , technology and information construction of Xinjiang.
英文关键词: Terminology;Comparable Corpus;Bilingual Alignment;Chinese-Uyghur