项目名称: 面向多源大数据的鲁棒聚类模型与算法研究
项目编号: No.61502289
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 自动化技术、计算机技术
项目作者: 杜亮
作者单位: 山西大学
项目金额: 21万元
中文摘要: 多源大数据的聚类分析是大数据研究面临的重要问题之一。由于数据规模大来源广,多源大数据聚类不得不面对数据中广泛存在复杂噪声。现有方法从不同角度进行多源聚类,如多视图聚类、聚类集成、多核聚类和多关系聚类。这些方法不能有效的处理多源复杂噪声。我们提出多源大数据鲁棒聚类方法系统性的处理多源复杂噪声带来的挑战,具体包括:1)在一个统一的框架中联合处理多源降噪和融合聚类两个相互依赖的子问题;2)利用融合聚类结果指导多源降噪,通过多源数据可靠性联合建模和多源噪声联合抽取两种策略刻画这些复杂噪声,并采用对应的噪声检测和噪声矫正两种鲁棒学习机制系统性的减轻多源复杂噪声的干扰;3)利用降噪后的数据进行一致性最大化学习,进而实现多源融合聚类;4)设计高效并易于在分布式计算平台部署的算法求解多源大数据鲁棒聚类模型;5)灵活调整该框架以处理不同类型的多源大数据。本项目的开展有助于提升对大数据内在价值的挖掘。
中文关键词: 鲁棒聚类;多视图聚类;聚类集成;多核聚类;多关系聚类
英文摘要: Cluster analysis of multi-source big data is an important issue in big data research. It has to face the big challenges arisen from multi-source noise with complex structures. Existing methods are developed from different perspectives, such as multi-view clustering, clustering ensemble, multi-kernel and multi-relational clustering. These methods can not effectively handle such noises. We propose a robust clustering framework to systematically address the challenges arisen from multi-source noise with complex structures. It is worthwhile to highlight several aspects of the proposed approach here: 1) The two key sub-problems, i.e. multi-source noise joint reduction and multi-source joint clustering, are integrated into a unified framework to well capture their interactions. 2) The multi-source joint clustering result is used to guide the process of multi-source noise joint reduction. The complex noise among multi-source data can be captured by either multi-source data reliability joint modeling or multi-source noise joint extraction. Thus, the adverse effect of multi-source noise can be systematically alleviated by the corresponding robust learning mechanism, i.e. error detection or error correction. 3) A better multi-source big data clustering can be expected by consensus maximization among noise reduced data. 4) To perform multi-source big data clustering in a distributed computing platform, an easy to deploy and efficient algorithm will also be developed. 5) The above multi-source robust clustering framework can be flexibly adapted for different scenarios. The mining of big data will be beneficial from the research on this project.
英文关键词: robust clustering;multi-view clustering;clustering ensemble;multiple kernel clustering;multi-relational clustering