项目名称: 藏文实体语义关系抽取理论与方法研究
项目编号: No.61262054
项目类型: 地区科学基金项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 于洪志
作者单位: 西北民族大学
项目金额: 43万元
中文摘要: 本项目研究藏文实体语义关系抽取理论与方法,为藏文舆情分析提供技术支持。研究内容:研究藏文格识别、词性标注、实体识别、自动断句,形成具有藏文特色的词法分析集成工具;研究藏文浅层句法分析,实现组块内实体语义角色自动标注;建立藏文语义知识库,研究共指消解算法,完成实体语义关系抽取模型;构建藏语实体关系应用平台,提供统一服务接口。拟解决的关键科学问题:藏文兼类格、紧缩格等格助词的识别;藏文浅层句法分析理论体系;藏文文本自动断句;藏文组块识别及块内结构;藏文实体语义角色自动标注;面向实体关系抽取的藏文语义知识库建设规范;藏文实体语义关系抽取的理论与方法。创新点:藏文实体识别与分词词性标注一体化方法;基于藏文文本语料统计和藏文句法结构分析的共指消解算法;结合藏文语法特征和藏文句法标注库,实现藏文组块自动识别及块内结构分析;在藏文实体关系标注库和语义关系模板基础上搭建实体语义关系分析平台。
中文关键词: 藏文实体;语义关系抽取;格识别;浅层句法分析;藏文语义知识库
英文摘要: This project focuses on the theory and methods for semantic relation extraction of Tibetan entities, with the aim to provide technical support for Tibetan public opinion anlysis. Main contents of this project are as follows: Tibetan case-auxiliary word recognition, part-of-speech tagging, entity recognition and automatic segmetation so as to develop intergrated tools for morphological analysis with Tibetan characteristics; shallow syntatic parsing of Tibetan to realize automatic annotaion of intra-chunk entity semantic roles; establishment of Tibetan semantic database for the study of coreference resolution algorithm to realize the semantic relation extraction model of entity; the construction of application platform for Tibetan entity relationship to provide unified service interface. Key scientific problems needed to be solved are: the recognition of Tibetan case-auxiliary word such as multiple and condensed ones; the theory system of Tibetan shallow parsing ; automatic segmentation of Tibetan texts; Tibetan chunk recognition and internal structure of chunks; automatic semantic role labeling of Tibetan entities; the entity-relation-extraction oriented construction norms of Tibetan semantic database; the theory and methods of semantic relation extraction of Tibetan entities. Innovations from this project are: a
英文关键词: Tibetan entity;semantic relation extraction;case-auxiliary word recognition;shallow parsing;Tibetan semantic database