项目名称: 维吾尔语语言资源监测关键技术与方法研究
项目编号: No.61262066
项目类型: 地区科学基金项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 玉素甫·艾白都拉
作者单位: 新疆师范大学
项目金额: 48万元
中文摘要: 维吾尔语言资源监测技术与实态监测不仅是新疆社会发展领域、少数民族语言信息处理领域急需研究的重要课题,而且涉及国家稳定、安全与国际影响的重大社会问题。本项目在现有的维吾尔语常用词干与规范研究成果的基础上,根据维吾尔语的特点和统计学原理理论,四大媒体作为维吾尔语真实语料对象,计算语言学角度研究维吾尔语言文字资源动态监测关键技术和维吾尔语常用词干、实态与动态语料库。此项技术将解决维吾尔语使用状况动态监测,构建维吾尔语流通语料库,提出维吾尔语常用词干表和并为开发语言信息资源监测系统提供定量科学依据。特别是在研究舆情分析或信息抽取、网络内容理解、多语种智能软件研发,确定新疆科技维稳等国家安全信息化工作的重要支撑,结束国家层面认可的没有维吾尔语常用词干表和动态流通语料库的被动局面等方面,具有重大的研究与应用价值。为创建和谐社会,为自治区文化教育以及经济社会发展服务,对科技维稳、科技促稳具有十分重要意义
中文关键词: 维吾尔语;监测技术;常用词干表;流通语料库;
英文摘要: Uyghur language resources monitoring technology and solid state monitoring becomes not only an urgent subject need to study in the Xinjiang social development areas and minority language information processing field, but also an significant social problem related to national stability, security and international influence. The project from the angle of computational linguistics research Uyghur language resources dynamic monitoring key technology and the Uyghur language commonly used word stem completion, solid state and dynamic corpus based on the existing achievement of normative research of the Uyghur language commonly used word stem completion and standard, according to the characteristics of the Uyghur language and principles of statistics theory, the four big media as a Uyghur language real corpora. This technology will solve the Uyghur language use in dynamic monitoring, constructing the national circulation corpus, puts forward in the Uyghur language commonly used word stem completion table and for development language information resources monitoring system provide quantitative scientific basis. Especially in the public opinion analysis or information extraction, network content understanding, multilingual intelligent software development, to determine the Xinjiang technology d steady national security i
英文关键词: Uyghur Language;Monitoring technology;Commonly used word stem complelion table;Circulation corpus;