We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques.
翻译:我们以单一集群方程式的形式提出了一种新的集群方法,能够直接发现数据中的组群。主要的建议是,每个样本的第一个相邻方需要发现大型链条和数据中的组群。与大多数现有的集群算法不同,我们的方法并不要求任何超参数、距离阈值和/或指定组群数量的需要。提议的算法属于等级组合法的组合。该技术的计算间接费用非常低,易于缩放,并适用于大型实际问题。对1077至8100万个样本不同领域众所周知的数据集的评估表明,与现有集群技术相比,其绩效有很大的提高。