项目名称: 基于几何覆盖方法的半监督聚类算法研究
项目编号: No.61302157
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 无线电电子学、电信技术
项目作者: 顾磊
作者单位: 南京邮电大学
项目金额: 25万元
中文摘要: 随着信息技术的迅速发展,各行各业每天都将产生各种各样的数据,人们要从这些数据中获取有用的信息,往往会求助于聚类技术。半监督聚类属于聚类技术的一种,它可以使用少量的监督信息来辅助大量无标记数据的聚类,近年来受到许多学者的关注。然而,一方面,当前的许多半监督聚类算法表现出来的性能常常并不能令人满意,另一方面,一些基于几何覆盖方法的监督学习和无监督学习算法在实际应用中已经证明了它们的优异性能,因此课题将几何覆盖方法引入半监督聚类,研究基于几何覆盖方法的半监督聚类算法,其研究不仅具有重要的理论意义,而且还具有广泛的应用前景。研究内容上,课题主要根据基于超球和超椭球的、基于Core-sets的以及基于水平集的三种几何覆盖方法来展开研究,除此之外,课题还将研究如何选择与利用少量的监督信息。通过课题的研究,以期丰富半监督聚类算法,为半监督聚类算法走向实用打下坚实基础。
中文关键词: 聚类;半监督聚类;几何覆盖;数据挖掘;
英文摘要: With the development of information technology, many data can be produced by all walks of life every day. People often apply clustering algorithms to mining some useful information from a large scale of data. Semi-supervised clustering, one of clustering algorithms, can use a small amount of labeled data to aid the clustering process. So many researchers had recently thought that semi-supervised clustering is popular. However, on the one hand a lot of semi- supervised clustering algorithms cannot show good advantage over other clustering approaches, on the other hand some classification and clustering technology based the geometric covering methods had been demonstrated their superiority. So this project will apply the geometric covering to the semi-supervised clustering algorithms and have an important effect on the theory of value and the application prospect. This project will research not only three geometric covering methods, such as, the methods related to the hypersphere and hyperellipsoid, the core-sets and the level sets, but also the way of selecting and using a small amount of labeled data. At last, if this project can be finished, its research achievements will be many semi-supervised clustering algorithms competent for the practical application.
英文关键词: Clustering;Semi-supervised Clustering;Geometric covering methods;Data mining;