A novel nonparametric clustering algorithm is proposed using the interpoint distances between the members of the data to reveal the inherent clustering structure existing in the given set of data, where we apply the classical nonparametric univariate kernel density estimation method to the interpoint distances to estimate the density around a data member. Our clustering algorithm is simple in its formation and easy to apply resulting in well-defined clusters. The algorithm starts with objective selection of the initial cluster representative and always converges independently of this choice. The method finds the number of clusters itself and can be used irrespective of the nature of underlying data by using an appropriate interpoint distance measure. The cluster analysis can be carried out in any dimensional space with viability to high-dimensional use. The distributions of the data or their interpoint distances are not required to be known due to the design of our procedure, except the assumption that the interpoint distances possess a density function. Data study shows its effectiveness and superiority over the widely used clustering algorithms.
翻译:提出一种新的非参数聚类算法,利用数据成员之间的点间距离揭示给定数据中存在的内在聚类结构,其中,我们采用经典的非参数单变量核密度估计方法来估计数据成员周围的密度。我们的聚类算法形成简单,易于应用,结果导致明确的聚类。该算法始于对初始集群代表的客观选择,并且始终独立于该选择而收敛。该方法本身找到聚类数,并且可以使用适当的点间距离度量独立于底层数据的性质进行使用。由于我们的过程设计,数据或其点间距离的分布不需要被知道,除了假设点间距离具有密度函数。数据研究表明,我们的方法比广泛使用的聚类算法更有效,更优越。