DP-means clustering was obtained as an extension of $K$-means clustering. While it is implemented with a simple and efficient algorithm, it can estimate the number of clusters simultaneously. However, DP-means is specifically designed for the average distortion measure. Therefore, it is vulnerable to outliers in data, and can cause large maximum distortion in clusters. In this work, we extend the objective function of the DP-means to $f$-separable distortion measures and propose a unified learning algorithm to overcome the above problems by selecting the function $f$. Further, the influence function of the estimated cluster center is analyzed to evaluate the robustness against outliers. We demonstrate the performance of the generalized method by numerical experiments using real datasets.
翻译:DP- means Group是作为以K$为单位的群集的延伸获得的。 虽然它是一个简单有效的算法, 它可以同时估计组群的数量。 但是, DP- means是专门为平均扭曲度量设计的。 因此, 它容易受数据外值的影响, 并可能造成群集的最大扭曲。 在这项工作中, 我们把DP- points的目标功能扩大到可分离的扭曲度量, 并提出一个统一的学习算法, 以通过选择函数来克服上述问题。 此外, 对估计的群集中心的影响功能进行了分析, 以评估对外值的强度。 我们通过使用真实的数据集进行数字实验来显示通用方法的绩效 。