Similarity-based clustering methods separate data into clusters according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose Clustering by Discriminative Similarity (CDS), a novel method which learns discriminative similarity for data clustering. CDS learns an unsupervised similarity-based classifier from each data partition, and searches for the optimal partition of the data by minimizing the generalization error of the learnt classifiers associated with the data partitions. By generalization analysis via Rademacher complexity, the generalization error bound for the unsupervised similarity-based classifier is expressed as the sum of discriminative similarity between the data from different classes. It is proved that the derived discriminative similarity can also be induced by the integrated squared error bound for kernel density classification. In order to evaluate the performance of the proposed discriminative similarity, we propose a new clustering method using a kernel as the similarity function, CDS via unsupervised kernel classification (CDSK), with its effectiveness demonstrated by experimental results.
翻译:基于相似性的分组方法根据数据之间的对称相似性将数据分为组群,而对称相似性对其性能至关重要。在本文中,我们建议用差异性相似性(CDS)将数据分组(CDS)来学习数据分组的区别性相似性(CDS),这是一种新颖的方法,可以学习数据分组的差别性相似性(CDS),CDS从每个数据分区中学习一个未经监督的类似性分类器,并通过尽量减少与数据分区相关的已学过分类者的一般性错误来搜索数据的最佳分割。通过Rademacher复杂性的一般性分析,将非监督性相似性分类器的通用性错误表述为不同类别数据之间歧视性相似性的总和。事实证明,导出的歧视性相似性也可以由封闭式内核密度分类的合并性差诱导出。为了评估拟议的差别性相似性,我们建议采用一种新的集群方法,即通过不统一性分类(CDS),通过实验结果来证明其有效性。