We present a new approach for solving (minimum disagreement) correlation clustering that results in sublinear algorithms with highly efficient time and space complexity for this problem. In particular, we obtain the following algorithms for $n$-vertex $(+/-)$-labeled graphs $G$: -- A sublinear-time algorithm that with high probability returns a constant approximation clustering of $G$ in $O(n\log^2{n})$ time assuming access to the adjacency list of the $(+)$-labeled edges of $G$ (this is almost quadratically faster than even reading the input once). Previously, no sublinear-time algorithm was known for this problem with any multiplicative approximation guarantee. -- A semi-streaming algorithm that with high probability returns a constant approximation clustering of $G$ in $O(n\log{n})$ space and a single pass over the edges of the graph $G$ (this memory is almost quadratically smaller than input size). Previously, no single-pass algorithm with $o(n^2)$ space was known for this problem with any approximation guarantee. The main ingredient of our approach is a novel connection to sparse-dense graph decompositions that are used extensively in the graph coloring literature. To our knowledge, this connection is the first application of these decompositions beyond graph coloring, and in particular for the correlation clustering problem, and can be of independent interest.
翻译:我们提出了一个解决(最小分歧)相关组合的新办法,该办法导致以高效时间和空间复杂度解决该问题的亚线性算法。特别是,我们获得了以下以美元为顶价(+/-)美元贴标签的图表$G$(G$)的亚线性算法: -- 一个以高概率返回以美元为美元(n\log%2{n})的恒定近似组合的亚线性算法,该算法假设使用美元(+)加美元标签的边缘(这几乎比一次阅读输入的速度要快得多 ) 。 之前,对于这个问题,没有以美元为顶价(+/-)美元贴标签的图形($+/美元)贴标签的亚线性算法。 -- 一个以高概率返回以美元为美元(n\log{n}美元)的空间和超过这些G$的边缘的单线性组合(这个记忆几乎小于输入大小 ) 之前, 与 美元(n_%2) 的直线性算算算算法没有亚化的亚性算算算法, 用于任何问题。