项目名称: 混合数据多粒度粗糙计算模型与算法研究
项目编号: No.61303008
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 魏巍
作者单位: 山西大学
项目金额: 25万元
中文摘要: 现实生活中广泛存在着名义型、数值型、有序型和区间型等多种类型并存的混合数据,这种结构复杂、形式异构的数据模式给传统的数据分析方法带来了挑战。本项目拟借鉴人类多粒度认知和逼近推理的问题求解方法,开展面向混合数据的多粒度粗糙分析模型与算法研究,主要内容有:(1)研究不同类型属性下目标概念的粗糙近似,探索不同类型粗糙近似的融合方法,建立面向混合数据多粒度粗糙集模型;(2)给出能够有效刻画混合数据中不同类型属性下目标决策边界域的大小和结构的粗糙性度量;(3)给出基于混合数据粗糙性度量的属性子集评价方法,建立从对象和属性两个方向同时缩小数据规模的启发式属性约简加速策略;(4)给出基于粗糙性度量的决策树生成算法,建立面向混合数据的随机森林分类方法。本项目研究成果将为面向混合数据的知识发现提供新途径,对数据挖掘和机器学习等领域的研究具有重要的理论意义和应用价值。
中文关键词: 粒计算;粗糙集;混合数据;属性约简;分类
英文摘要: In real-world applications, data usually take on hybrid forms including nominal, numerical, ordered and interval. These data patterns with complex structure and heterogeneous form have brought many challenges for traditional data analysis approaches. Main content includes: (1) Analyzing the rough approximations of target concepts in the context of various types of attribute, exploring fusing approaches for various types of rough approximations, and constructing multigranulation rough set model for hybrid data. (2) Presenting new definitions of roughness which can be used to measure size and structure of boundary region got from hybrid data. (3) Designing roughness-based evaluation algorithms for attribute subsets in hybrid data. (4) Proposing roughness-based approaches for generating decision tree which are applicable for hybrid data, and constructing random forest which can be used to classify hybrid data. Results of this subject will provide new ways for knowledge discovery from hybrid data, which also have theoretical significance and practical application for many areas include data mining and machine learning.
英文关键词: Granular computing;Rough set;Hybrid data;Attribute reduction;Classification