项目名称: 分布式数据流的集成模式挖掘模型和概念漂移检测算法研究
项目编号: No.60873145
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 矿业工程
项目作者: 毛国君
作者单位: 北京工业大学
项目金额: 30万元
中文摘要: 许多应用具有典型的分布式数据流特征。和单数据流的模式挖掘相比,分布式数据流需要分布式的挖掘构架,由此带来的理论和方法上的问题需要解决。本项目研究了分布式数据流的形式化方法、分布式数据流的集成模式学习模型、节点级(单数据流)的局部模式更新算法、分布式数据流的全局模式挖掘模型与算法、面向于数据到达不均匀的分布式数据流的概念漂移挖掘算法等问题。利用密度网格、支持向量机以及微簇等先进技术,解决了分布式数据流的全局分类以及聚类等问题;利用数学和人工智能等手段,研究了分布式数据流的形式化表示及挖掘模型构造;利用统计学和已有的数据挖掘技术等,探索了分布式数据流的全局模式发现模型和算法构造等。实验说明:我们提出的方法能很好地适应分布式数据流模式挖掘的需要,有较高的精度或者效率。
中文关键词: 分布式数据流;集成学习;概念漂移;全局分类;全局聚类
英文摘要: There are many applications have typical features of distributed data streams. Comparing with a single data stream, a distributed data stream needs new mining frameworks to support in distributed ways, and it can result in many new problems in theory and methods. We have studied some important problems in mining distributed data streams, which involves expression formalization, ensemble learning, local model updating, global model mining and concept drifting in distributed data streams. Using density-grid, SVM and micro-cluster, we created some global classifying and clustering models; Making use of mathematics and artificial intelligence, we constructed distributed mining expression models; Studying statistics and data mining methods for distributed data streams, we designed some global models and algorithms for mining distributed data streams. Experimental results demonstrate that the proposed methods are able to help build mining models more accurate or efficient than other simple approaches can offer.
英文关键词: distributed data stream; ensemble learning; concept drifting; global classification; global clustering