项目名称: 面向大数据的一致性分类及应用研究
项目编号: No.71201004
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 管理科学与工程
项目作者: 熊海涛
作者单位: 北京工商大学
项目金额: 19万元
中文摘要: 分类是数据挖掘这一新兴交叉学科的核心领域,并应用于商务智能等众多重要领域。研究表明,传统分类算法由于只返回单次分类结果往往容易得到劣解。集成学习方法通过组合多分类结果在一定程度上解决了这个问题,但没有从效用最优上进行考虑;同时还无法解决大数据中诸如样本不一致等问题。有鉴于此,本项目研究应用于大数据的多分类结果效用最优整合问题即"一致性分类"问题,其核心在于从模式空间中找到一个与多个基础分类分量最为相似的分类结果,其优点在于结果的鲁棒性、准确性以及对大数据的适应性,其难点在于问题本身是一个NP完全的组合优化问题。具体而言,本项目首先建立一致性分类的理论基础,然后系统地研究一致性分类的效用函数选择问题和基础分类分量的生成策略,接着构建一致性分类算法框架,最后开发可用于并行计算的系统原型,在商务实践领域的大数据上做深入的应用研究。本项目有望对一致性分类的理论和应用实践提供重要的补充和推动作用。
中文关键词: 集成学习;大数据;生成策略;一致性;复杂数据
英文摘要: Classification is a core field of data mining which is a rising interdisciplinary. It has been applied to many important areas such as business intelligence and so on. Researches show that tradition classification algorithms produce only one solution, which is more likely to be an inferior one. Through combination of different classification results, ensemble learning can solve this problem to a certain extent. However, the utility optimization is not considered. In addition, some specific problems inherent with big data, like inconsistent data, can not be handled either. To meet this critical challenge, this proposal aims to get an optimal integral result with maximal utility from big data's mutil-classifications, which can be defined as a consensus classification problem. It focuses on the way to find a single result from the pattern space which agrees as much as possible with existing basic mutil-classifications. Consensus classification has been widely recognized that has merits in robustness, accuracy, applicability with big data. But it has been proofed to be a NP-complete problem. Specifically, this proposal at first will establish the theoretical foundation of consensus classification, and then systematically study the choice of utility function and the generation strategy of basic classifications. After
英文关键词: Ensemble Learning;Big Data;Generation Scheme;Consistency;Complex Data