项目名称: 基于模糊粗糙集的概率数据挖掘方法研究
项目编号: No.61202114
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 赵素云
作者单位: 中国人民大学
项目金额: 24万元
中文摘要: 随机性是概率数据的本质特征,但并非唯一的不确定性特征。概率数据不仅具有不同表现形式的随机性,还具有特征值的语义模糊性和信息不完备引起的粗糙不可分辨性等。经典的数据挖掘方法没有考虑数据的不确定性。现有的概率数据挖掘方法仅考虑了数据某一表现形式的随机性,忽略了其他类型和表现形式的不确定性,应用于概率数据挖掘时存在一系列的问题。 本研究尝试突破当前不确定数据管理领域以'可能世界模型'为基础的概率数据表示模型,系统分析随机性、语义模糊性和粗糙不可分辨性等不确定性,建立以概率数据库为研究对象,以模糊粗糙集为研究工具,以特征选取和规则分类为目标的数据挖掘模型。由此得到基于广义随机粗糙集的数据挖掘的理论模型和算法框架。 本研究吸收了模糊粗糙集模型的粗糙近似算子的构造成果,具有良好的研究基础。本研究将拓宽粗糙集的实用性,也为其它挖掘算法拓展到概率数据库提供理论参考。
中文关键词: 嵌套结构;模糊粗糙集;分类器;关联规则挖掘;不确定处理
英文摘要: In the real applications, probabilistic databases contain not only randomness,but also fuzziness and roughness. The traditional approaches of data mining,in which all data are assumed certainly, can not effectively handle probabilistic databases. The recently proposed mining approaches on probabilistic data treat the uncertainty as randomness, whereas fuzziness hidden in feature values and the indiscernibility arisen from incomplete information are missed. Now, it is promising to propose a theoretical framework and mining approach to handle several kinds of uncertainties in probabilistic databases. Replacing the 'Possible Worlds' model, which is often used in probabilistic data managing, this proposal attempts to propose a new framework named generalized statistical rough model.In this framework the fuzzy rough techniques is developed to represent and measure weak information hidden in probabilistic databases. And then the approaches of feature selection and classifier building from probabilistic databases are proposed. This proposal generalizes the rough set theory to a more general model. And it is first time to propose such an uncertain data mining model which integrate randomness, fuzziness and roughness together.
英文关键词: nested structure;fuzzy rough sets;classifier;association rule mining;uncertain information processing