项目名称: 基于数据分布评估和支持向量机方法的分布式数据流挖掘模型和算法研究
项目编号: No.61273293
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 毛国君
作者单位: 中央财经大学
项目金额: 81万元
中文摘要: 分布式数据流是指相关联的分布在不同节点的多数据流。分布式数据流挖掘已经成为数据挖掘新的研究分支,全局模式挖掘是其中的一个核心问题。全局模式挖掘是以局部数据或者模式的传输和集成为基础的,因此网络数据传输代价和挖掘精度是两个基本指标。减少传输代价意味着要尽量少的传输原始数据,而提高挖掘精度意味着要尽量多地利用局部数据流的有用信息,所以一个优秀的挖掘模型应该是追求"在适当的传输代价下寻求较高的挖掘精度"。本项目通过数据分布评估和支持向量机方法来解决分布式数据流的全局模式挖掘问题。在理论上,研究分布式数据流的全局模式挖掘模型及其对应的模式演化所需的理论基础。在方法上,面向于分布式数据流的全局模式挖掘目标,研究有效的分布式的数据分布评估和支持向量机技术,并且使用它们来发现可用的小容量的学习样本。设计对应的分布式数据流的全局分类和聚类算法,理论分析和实验验证算法的精度和效率。
中文关键词: 数据挖掘;分布式数据流;全局模式;数据分布评估;支持向量机
英文摘要: A distributed data stream is a set of some related data streams which can occure at multiple nodes in a network. Mining distributed data streams has become a focus of data mining research, and discoving global patterns in a distributed data stream is an important issue. Mining global patterns needs collecting and transfering local data from local nodes in distributed ways, so a good method for mining distributed data streams should have a low transmission cost and a high mining precision. However, when reducing a transmission cost means less data to be transferred from local nodes to the central node, getting a high mining precision means more useful information from local nodes to be used, therefore, mining distributed data streams is a trade-off art of transmission cost and mining precision. This project proposes solving the problem in mining distributed data streams by evaluating data distributions and using SVM methods.It will study the theory and models of mining distributed data streams, explore the effective methods in data distribution evaluation and SVMs for getting learning samples in data streams, and design the efficient algorithms to classifying and clustering distributed data streams. Also, these models and methods will be tested by a series of experiments.
英文关键词: data mining;distributed data stream;global patter;data distribution evaluation;SVM