项目名称: 维吾尔文命名实体识别关键技术研究
项目编号: No.61262060
项目类型: 地区科学基金项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 艾山·吾买尔
作者单位: 新疆大学
项目金额: 45万元
中文摘要: 命名实体是信息的主要载体,用来表达文本的主要内容,也是正确理解文本的基础,进行命名实体识别是了解一篇文章最简单快捷的一种方法。在自然语言处理研究方面,命名实体识别的效果,对词法、句法、语义分析等都具有极其重要的影响,在应用方面,命名实体识别是信息抽取、机器翻译、信息过滤、问答系统等研究分支的基础技术。目前,在国内外的众多研究者已对命名实体识别技术展开了深入研究,提出了很多算法、模型,取得了很大的突破,并研发了可用的识别系统。迄今为止,还没有学者针对维吾尔文命名实体识别技术展开任何系统地研究工作,命名实体识别技术已经成为了制约维吾尔文信息处理进一步发展的瓶颈。本项目中,利用现有语料库构建500万词次人工标注的语料库,采用规则、统计等方法对维吾尔文人名、地名及机构名识别展开深入研究,并研发达到实用水平的识别系统,该系统将提高汉维机器翻译、维吾尔文信息检索、维吾尔文不良信息过滤等系统性能。
中文关键词: 维吾尔文;人名;地名;机构名;粘着型语言
英文摘要: Named entity is the main carrier of information, used to express the main content of the text, and is also the basis for correctly understanding the text, Named Entity Recognition(NER) is the most simple and quick way to understand an article. In the area of natural language processing research, the result of NER is extremely important to morphological, syntactic and semantic analysis, in terms of application, NER is basic key technology for information extraction, machine translation, information filtering, question-and-answer system etc. At present, many researchers of domestic and foreign have carried in-depth studies on NER, proposed algorithms, models, made a major breakthrough, and developped appilicable systems. So far, no scholars have carried any systematically work on Uyghur NER technology, NER has become the bottleneck of Uyghur information processing development. In this project, we build a 500 million words artificial tagged corpus by using existing corpus, carry research on Uyghur people name recognition, place name recognition, oragnization name recogniton by using rule based and statistical based methods, and develop an applicale system, this system will improve the performance of system such as Chinese-Uyghur Machine Translation system, Uyghur information retrieval system and Uyghur illegal info
英文关键词: Uighur language;Person Name;Location Name;Organization Name;Agglutinative Language