项目名称: 维吾尔、哈萨克、柯尔克孜文跨语言信息检索技术研究
项目编号: No.61262063
项目类型: 地区科学基金项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 维尼拉·木沙江
作者单位: 新疆大学
项目金额: 46万元
中文摘要: 新疆民族语言信息资源十分丰富,而且这些语言与周边国家语言同属一个语族,随着民族语言信息化和网络化迅速发展,境内外民族文网站层出不穷,网上信息急剧增长。随之而来的问题是由于缺乏好的信息检索系统,快速、准确、全面、方便地检索并获取有用的民族语言网络信息是时代的要求,是我国目前面临的一个急待解决的重要问题。目前国内外的众多研究者已对信息检索技术展开了深入研究,提出了很多算法,但还没针对维吾尔、哈萨克、柯尔克孜跨语言信息检索技术展开系统地研究工作。本项目拟以语言模型为框架,研究维吾尔、哈萨克、柯尔克孜文跨语言检索系统关键技术为目标,应用基于统计、数据挖掘、网络爬行、计算语言学等理论与技术,针对维吾尔、哈萨克、柯尔克孜等语言特点系统地研究维吾尔、哈萨克、柯尔克孜文词干析取、信息检索模型、检索结果排名模型以及维吾尔、哈萨克、柯尔克孜文关联词典的构建等,解决实现维吾尔、哈萨克、柯尔克孜文跨信息的关键技术
中文关键词: 维哈柯文识别;文本校对;N元模型;WordNet;查询扩展
英文摘要: Xinjiang region is rich of information provided in minority languages, which belong to same language system with languages being used in neighbor countries. With the rapid development of informatization of minority languages and popularization of Internet, increasing number websites have been emerged with high quantity of content in minority languages. Meanwhile, as a demand of the times, an urgent need for a proper search engine which could enables us obtain valuable information from these websites efficiently has been rose greatly, and been important problem needs to be solved. Although many domestic and oversea researchers have been working on Information Retrieval technologies, and, as result, many relevant algorithms have been proposed, but no work aimed at minority languages such as Uyghur, Kazakh and Kirgiz has been done systematically. We proposed a project which aims at framing language model, researching main components of Uyghur, Kazakh and Kirgiz Search Engine, and, for which, we need to apply several computer science fields including Probability and Statistic, Data Mining, Web Crawling and computational linguistic etc. Moreover, we systematically research Uyghur, Kazakh and Kirgiz stemming technology, information retrieval model, ranking model and generation of grid dictionary of these three languag
英文关键词: UKK identification;text correction;N-gram model;WordNet;query extension