项目名称: 大数据环境下稀有类数据挖掘研究
项目编号: No.61502347
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 黄浩
作者单位: 武汉大学
项目金额: 21万元
中文摘要: 大数据中的稀有类蕴藏着巨大的潜在价值,挖掘出它们常会带来重要的新发现、新知识。但是稀有类仅有少量数据样本且常隐匿在大数据子维度空间中,使其挖掘工作具有较大困难和挑战。而现有稀有类数据挖掘研究往往忽略大数据中稀有类的子维度空间特性,且其算法计算代价较大,因而限制了它们在大数据上的可用性。因此,本项目将以大数据环境为研究背景,以准确、高效地挖掘出海量高维数据集中稀有类为核心目标,系统研究符合实际应用特点和应用需要的稀有类数据挖掘算法,包括面向大数据的稀有类检测算法和分类算法,解决如何合理分解大数据、如何有效获取稀有类分类查找空间、如何设计针对稀有类的降维技术等关键科学问题,保证所提方法的可用性、效率及性能。同时,拟建设一个集成本项目主要研究成果的稀有类数据挖掘算法展示平台,以作为今后研究成果向实际应用推广的基础平台。
中文关键词: 稀有类;大数据;数据挖掘;检测;分类
英文摘要: Rare categories in big data have great potential values since the discovery of them often brings new important findings and knowledge. However, a rare category has only a few data examples and often hides in a sub-feature space of big data, resulting in many difficulties and challenges for the mining of rare categories. Nonetheless, the existing research take no account of sub-feature space and also require substantial computation when they conduct rare-category data mining, both of which limit their usability on big data. Hence, the research of this project focuses on big data environment, aims at effectively and efficiently mining out rare categories in very-large high-dimensional data sets, and tries to propose rare-category data mining algorithms that match the actual application characteristics and requirements, including big data-oriented rare-category detection and classification algorithms. To ensure the usability, efficiency and performance of these algorithms, the key scientific problems in this research, such as how to reasonably decompose big data, how to effectively find out the search space for rare-category classification, and how to design dimension reduction techniques for rare categories specially, will be finely solved. Meanwhile, a display platform integrated the main research results of this project will be built, and work as a basic platform to help extend the research results to real-world applications.
英文关键词: Rare Category;Big Data;Data Mining;Detection;Classification