Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental analysis of our techniques.
翻译:关联组群是未经监督的学习的一个中心主题,在 ML 和数据挖掘中有许多应用。 在相关组群中,人们会收到一个签名的图表作为输入,目的是分割它,以尽量减少分歧的数量。在这项工作中,我们建议对这一问题进行大规模平行的计算算法,该算法比先前的工作要快得多。特别是,我们的算法使用图中节点数量中含有内存子线的机器,并返回一个恒定近似值,同时只运行数个不变的回合。据我们所知,我们的算法是第一个能够利用子线性记忆系统中仅使用恒定数的 MPC 圆盘在图表中估计聚集问题的算法。我们用对技术的实验性分析来补充我们的分析。