Correlation clustering is a widely studied framework for clustering based on pairwise similarity and dissimilarity scores, but its best approximation algorithms rely on impractical linear programming relaxations. We present faster approximation algorithms that avoid these relaxations, for two well-studied special cases: cluster editing and cluster deletion. We accomplish this by drawing new connections to edge labeling problems related to the principle of strong triadic closure. This leads to faster and more practical linear programming algorithms, as well as extremely scalable combinatorial techniques, including the first combinatorial approximation algorithm for cluster deletion. In practice, our algorithms produce approximate solutions that nearly match the best algorithms in quality, while scaling to problems that are orders of magnitude larger.
翻译:关联集群是一个基于双向相似性和差异性分数进行分组的广泛研究的框架,但其最佳近似算法依赖于不切实际的线性编程松动。 我们为两个研究周密的特殊案例提供了避免这些松动的更快近似算法:集群编辑和集群删除。 我们通过绘制新的连接来达到这个目的,从而在与强烈三重封闭原则有关的问题上划清界线。 这导致更快和更加实用的线性编程算法,以及极其可缩放的组合组合技术,包括首个用于删除集群的组合近似算法。 实际上,我们的算法产生了接近质量最佳算法的近似解决方案,同时将范围扩大到规模更大的问题。