We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions, satisfying some mild assumptions. The main advantage of the proposed approach is a simple and computationally cheap update rule. Unlike previous methods that specialize to a specific formulation of the clustering problem, our approach is applicable to a wide range of costs, including non-Bregman clustering methods based on the Huber loss. We analyze the convergence of the proposed algorithm, and show that it converges to the set of appropriately defined fixed points, under arbitrary center initialization. In the special case of Bregman cost functions, the algorithm converges to the set of centroidal Voronoi partitions, which is consistent with prior works. Numerical experiments on real data demonstrate the effectiveness of the proposed method.
翻译:我们建议采用基于远程的集群的一般方法,使用成本函数的梯度,衡量集群任务和集群中心位置的集群质量。该方法是一个迭代的两步程序(集群任务和集束中心更新之间互换),适用于广泛的功能,满足一些温和的假设。拟议方法的主要优点是简单和计算成本低廉的更新规则。与以往专门专门处理集群问题具体表述的方法不同,我们的方法适用于范围广泛的成本,包括基于Huber损失的非Bregman组合方法。我们分析了拟议的算法的趋同,并表明它与一套定义适当的固定点相融合,在任意的中心初始化下。在布雷格曼成本功能的特殊情况下,算法与一套符合先前工作的环球型Voronoiois分区法相融合。关于实际数据的量化实验证明了拟议方法的有效性。