Correlation clustering is a framework for partitioning datasets based on pairwise similarity and dissimilarity scores, and has been used for diverse applications in bioinformatics, social network analysis, and computer vision. Although many approximation algorithms have been designed for this problem, the best theoretical results rely on obtaining lower bounds via expensive linear programming relaxations. In this paper we prove new relationships between correlation clustering problems and edge labeling problems related to the principle of strong triadic closure. We use these connections to develop new approximation algorithms for correlation clustering that have deterministic constant factor approximation guarantees and avoid the canonical linear programming relaxation. Our approach also extends to a variant of correlation clustering called cluster deletion, that strictly prohibits placing negative edges inside clusters. Our results include 4-approximation algorithms for cluster deletion and correlation clustering, based on simplified linear programs with far fewer constraints than the canonical relaxations. More importantly, we develop faster techniques that are purely combinatorial, based on computing maximal matchings in certain auxiliary graphs and hypergraphs. This leads to a combinatorial 6-approximation for complete unweighted correlation clustering, which is the best deterministic result for any method that does not rely on linear programming. We also present the first combinatorial constant factor approximation for cluster deletion.
翻译:关联集群是一个基于相近性和差异性分数的分割数据集的框架,并被用于生物信息学、社会网络分析和计算机愿景中的各种应用。虽然为这一问题设计了许多近似算法,但最佳理论结果依赖于通过昂贵的线性编程放松获得较低界限。在本文中,我们证明关联集群问题和与强三角封闭原则有关的边缘标签问题之间有新的关系。我们利用这些连接为相关集群开发新的近似算法,这些近似算法具有确定性常数近似保证并避免罐头线性编程松动。我们的方法还扩展为相关组合的变体,称为集群删除,严格禁止将负边缘放置在集群内。我们的结果包括基于简化线性编程程序,其限制远小于卡门性放松原则。更重要的是,我们开发了纯粹的组合比对齐技术,其基础是某些辅助图表和高压图中的计算最大比对等,从而避免了线性编程松动。这导致一个称为组合组合组合组合组合的6对准法,即严格禁止在集群内放置负直径直径直径的组合。我们目前用于确定任何不重的组合式组合式编程的组合的组合结果。