Several clustering methods (e.g., Normalized Cut and Ratio Cut) divide the Min Cut cost function by a cluster-dependent factor (e.g., the size or the degree of the clusters), in order to yield a more balanced partitioning. We, instead, investigate adding such regularizations to the original cost function. We first consider the case where the regularization term is the sum of the squared size of the clusters, and then generalize it to adaptive regularization of the pairwise similarities. This leads to shifting (adaptively) the pairwise similarities which might make some of them negative. We then study the connection of this method to Correlation Clustering and then propose an efficient local search optimization algorithm with fast theoretical convergence rate to solve the new clustering problem. In the following, we investigate the shift of pairwise similarities on some common clustering methods, and finally, we demonstrate the superior performance of the method by extensive experiments on different datasets.
翻译:几个组群方法(例如,普通化计算和比率计算)将最小削减成本函数除以一个依赖组群的因素(例如,群集的大小或程度),以便产生更平衡的分隔。我们相反地在原始成本函数中增加这种正规化。我们首先考虑正规化术语是组合体平方大小之和的情况,然后将其概括为对等相似点的适应性正规化。这导致改变(调整)对等相似点,这可能使它们中的一部分出现负差点。我们然后研究这一方法与关联组合体的联系,然后建议一种高效的本地搜索优化算法,以快速理论趋同率解决新的集群问题。在下文,我们研究一些共同组群方法对等相似点的转变,最后,我们通过对不同数据集的广泛实验,展示了这种方法的优异性表现。