项目名称: 基于约束的高维数据聚类
项目编号: No.61272374
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 张宪超
作者单位: 大连理工大学
项目金额: 80万元
中文摘要: 聚类是数据挖掘的基本内容,它帮助发现数据的自然结构,在很多领域起重要作用。近年来产生的大量高维数据给传统聚类算法带来被称为维度灾难的巨大挑战,主要表现为:在高维数据中不同的簇对应于不同的子空间,发现子空间和发现簇这两个任务是循环依赖的。为了打破这种循环依赖关系,现有算法通常对数据集做某种假设,而这些假设在多数情况下是不成立的。通过前期大量研究,我们认识到约束信息可以用来打破这种循环依赖关系。但基于约束的高维数据聚类研究刚刚起步,仅有的几个算法都是对现有无监督算法的局部改进,没有摆脱对数据集的假设,即没有在真正意义上解决循环依赖这个根本问题。本项目在我们前期研究取得进展的基础上,通过引进约束与子空间相关度的概念来解决高维数据聚类的循环依赖问题,并将约束用于聚类的各个环节,获得基于约束的高维数据聚类高质量算法,解决高维数据聚类的维度灾难,为初步建立基于约束的高维数据聚类算法和理论体系奠定基础。
中文关键词: 聚类;高维数据;不确定数据;多视角聚类;多任务聚类
英文摘要: Clustering, which helps to find natural structure of data, is an essential content of data minning and plays an important role in many fields. In recent years, massive high-dimensional data has been produced, which poses hugh challenge,called the curse of dimensionality, to traditional clustering algorithms. The challenge is mainly because that in high dimensional data, different clusters are embeded in different subpaces, and the tasks of finding subspaces and detecting clusters are circular dependent. To break the circular dependency, existing algorithms usually make some assumptions on the data set. However, these assumptions do not make sense in most situations. Through numerous studies, we have learned that constraint information could be used to break this kind of circular dependency. Nevertheless, research on constraint based high dimensional data clustering is just the beginning. The only few algorithms are all local improvements on existing unsupervised algorithms. They do not escape from making assumptions on the data set, thus could not really break the circular dependency. In this project, based on our previous results, we introduce the concept of correlation between constraints and subspaces to solve the circular dependency problem of high-dimensional data clustering. We also apply constraints to al
英文关键词: Clustering;High Dimensional Uncertain Data;Uncertain Data;Multi-view Clustering;Multi-task Clustering