Recent progress in center-based clustering algorithms combats poor local minima by implicit annealing, using a family of generalized means. These methods are variations of Lloyd's celebrated $k$-means algorithm, and are most appropriate for spherical clusters such as those arising from Gaussian data. In this paper, we bridge these algorithmic advances to classical work on hard clustering under Bregman divergences, which enjoy a bijection to exponential family distributions and are thus well-suited for clustering objects arising from a breadth of data generating mechanisms. The elegant properties of Bregman divergences allow us to maintain closed form updates in a simple and transparent algorithm, and moreover lead to new theoretical arguments for establishing finite sample bounds that relax the bounded support assumption made in the existing state of the art. Additionally, we consider thorough empirical analyses on simulated experiments and a case study on rainfall data, finding that the proposed method outperforms existing peer methods in a variety of non-Gaussian data settings.
翻译:以中心为主的集群算法最近的进展通过隐性肛交,使用通用手段,与当地贫穷的小型企业进行斗争。这些方法是劳埃德以美元计价的货币计算法的变异,最适合于球类组群,如高斯数据产生的组群。在本文中,我们将这些算法的进展与布雷格曼差异下关于硬集群的经典工作联系起来,这些分类法具有指数式家庭分布的双向特征,因此适合于由数据生成机制的广度产生的组合对象。布雷格曼差异的优雅特性使我们能够以简单和透明的算法保持封闭式更新,并导致新的理论论据,用以建立有限的抽样界限,放松现有艺术状态中受约束的支持假设。此外,我们考虑对模拟实验的透彻经验分析和降雨数据案例研究,发现拟议的方法在各种非高加索数据环境中超越了现有的同侪方法。