项目名称: 基于多层次语言粒度的文本情感分类研究
项目编号: No.60875040
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 轻工业、手工业
项目作者: 王素格
作者单位: 山西大学
项目金额: 30万元
中文摘要: 本项目对多层次语言粒度下的文本情感分类进行了系统研究,主要研究结果如下:(1)建立了面向中文情感分类的情感词表、搭配库、句子库和文本语料库资源。(2)通过词间的语义关系,提出了词汇、搭配的情感倾向识别,量化了情感倾向强度,并将这些研究成果用于句子的情感倾向判别中。(3)从特征的类别区分能力角度,设计了基于Fisher判别准则函数,提出了一种高效的文本情感分类特征选择方法。(4)利用特征倾向强度,建立了基于二元组属性的文本表示模型。提出了基于情感倾向强度序的属性离散化方法,将特征选择寓于离散化过程,达到了数据降维的目的。利用特征倾向强度,定义了赋权粗糙隶属度,用于新文本的情感分类。(5)将概念格和粒度计算引入到本体研究中,为本体的构建、合并和连接提供了一种统一的基于领域本体基的不同粒度下的知识获取模型,为专家判定概念间和本体之间的关系提供了一定依据。(6)通过粗糙隶属函数定义了两个概念之间的距离,设计了不同滑动窗口下聚类结果演化趋势的可视化算法。(7)将上述理论成果应用于汽车和旅游领域,不仅丰富了文本情感分类的理论成果,同时对主观性文本数据处理提供了新方法与新技术。
中文关键词: 文本情感分类;多层次语言粒度;粗糙集理论;特征选择;本体
英文摘要: In this project, text sentiment orientation classification methods from the view of multi-hierarchy linguistic granularity have been systemically researched. Its main results are as follows: (1)The resources have been established,such as sentiment words table, colloctation base, sentences base and corpora. (2)Using Semantic relationships between words and quantifying their sentiment orientation intensity, sentiment orientation identification methods for words, collactions and sentence are proposed. (3)From the viewpoint of the contribution of a candidate feature to distinguishing text sort, a kind of effective feature selection method based on improved Fisher's discriminant ratio is proposed for text sentiment classification. By considering two kinds of probability estimations, four kinds of feature selecting techniques are then proposed. (4)A method of text sentiment classification based on weighted rough membership is proposed. In the method, the model of text expression is established based on two-tuples attribute, by introducing feature orientation intensity into the method of vector space representation. An attribute discrete method is proposed based on the sentiment orientation sequence for feature selection unifying the discretization processing to depress data dimension. To utilize the feature orientation intensity, a weighted rough membership is defined for classifying new sentiment text. (5)By introducing concept lattice and granular computing into ontology learning, and a unified research model is presented for ontology building, ontology merging and ontology connection based on the domain ontology base in different granulations. In this model, based on similarity models mentioned above, the ontology building, ontology merging and ontology connection can be obtained in different granulations with the help of domain experts. (6)The cluster time-evolving data is proposed based on the rough membership function and the sliding-window technique by defining the distance between two concepts. (7)The theoretical results metioned above are applied to car and tour domain. These results not only enrich text sentiment orientation classification theory, but also provide new theory and effective technology for subjective text data processing.
英文关键词: text sentiment classification; multi-hierarchy linguistic granularity; rough set theory; feature selection; ontology