项目名称: 基于Ontology的藏文语料库检索关键技术研究
项目编号: No.61262053
项目类型: 地区科学基金项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 多拉
作者单位: 西北民族大学
项目金额: 45万元
中文摘要: 基于Ontology的藏文语料库个性化检索的实现有利于藏语语言学研究和各种知识库的建设、藏文搜索引擎、藏汉机器翻译、文本信息抽取等。本项目采用知识工程与机器学习相结合的方法,研究和建立藏文字符构件、字符、字丁、音节分类库,针对藏文虚词在句子中同谓词扮演句法架构和语义桥接的角色,研究和构建藏文虚词知识库、谓词语义映射关系库、藏文概念语义框架等领域知识体系或实体,并完成基于虚词黏着识别规则和CRF (Conditional random fields,条件随机域)模型的藏文分词标注系统,充分利用Ontology具有知识资源共享和重复使用、可扩展的特点,将Ontology与藏文信息检索有机结合起来,从系统底层解决藏文检索中字符、字丁、音节、词汇等常被"肢解"的问题,从高层实现具有概念理解能力的语义检索技术,从而完成一个藏文新型检索系统,以满足不同用户的个性化检索需求。
中文关键词: 藏文本体;领域本体;藏文分词;藏文音节;语义web
英文摘要: The realization of Tibetan corpus retrieving based on Ontology could benefit the studies of Tibetan linguistics, the construction of various the knowledge bases as well as benefit Tibetan searching engines, Tibetan-Chinese Mechine Translation, Textual Information Extraction and so on. This programe uses the Knowledge Engineering and Learning with mechines to study and build the corpuses of Tibetan characters, words and syllables. To study and build a corpus of Tibetan Function Words, Predicate Semantic Mapping Library, Tibetan words and semantic framework of knowledge in the field or entity against the Tibetan Function Words in the sentences with the predicates playing a role of syntactic structure and semantic bridging. Tibetan word segmentation tagging system based on agglutinated function words recognition rules and CRF(Conditional random fields) model should be completed. Thus, Ontology with knowledge resource can be shared and reused, it has extended features as well. To conbine Ontology and Tibetan Information retrieval to solve the problem of breaking down of the characters, syllables and words in Tibetan conpletely. To complete a new retrieval system in Tibetan to meet the retrieval needs of different users through the high conceptual ability to understand the semantic retrieval.
英文关键词: Tibetan Ontology;Domain Ontology;Tibetan Word segmentation;Tibetan Syllable;semantic web