项目名称: 维吾尔语命名实体间语义关系抽取理论方法研究
项目编号: No.61462083
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 其他
项目作者: 卡哈尔江·阿比的热西提
作者单位: 新疆大学
项目金额: 46万元
中文摘要: 维吾尔语属于阿尔泰语系突厥语族,是一种复杂形态语言,具有不同于英语、汉语的独特词法、句法等语言特点。复杂形态语言命名实体间语义关系表示和自动抽取是 维吾尔语互联信息处理面临的重要科学问题。本课题将以维吾尔语命名实体间语义关系抽取为研究目标,着重研究制定具有统一标准的、面向信息处理的维吾尔语命名实体间语义关系标注规范,研发以主动学习为智能手段的关系标注工具,并在此基础上构建关系训练语料库,进一步研究基于随机游动模型的监督和半监督相结合的维吾尔语关系特征抽取方法,最后研究出符合维吾尔语语言特点的命名实体间语义关系自动抽取混合方法。项目研究成果为维吾尔语知识库的自动生成、语义 Web、智能信息 检索、自动问答系统以及自然语言理解研究打下坚实的基础。
中文关键词: 维吾尔语;实体关系抽取;半监督学习;特征提取
英文摘要: Uyghur language, which belongs to the Altaic language family of Turkic subgroup,is a morphologically complex language and have it's morpholigical, syntactic characteristics different from English and Chinese. Named entity relation notation and extraction on morphologically complex langguage faces important scientific issues. This project will study Uyghur named entity semantic relation extraction, includes drawing up a unified standard Uyghur named entity semantic relation annotation guidelines for the Uyghur information processing, the development of smart Uyghur named entity semantic relation annotation tools based on active learning, and on this base, construct the training corpus for relation extraction. And we further study combined methods of feature extraction based on random walk supervised model and semi-supervised machine learning, finally develop hybrid approach for Uyghur named entity semantic relation extraction in line with the characteristics of the Uyghur language. Those will lay a solid research basis for automatic generation of Uyghur knowledge base, semantic Web,intelligent information retrieval and natural language understanding.
英文关键词: Uyghur language;Entity relation extraction;Semi-supervised learning;Feature extraction