Correlation Clustering is an important clustering problem with many applications. We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering that has been corrupted by random noise and adversarial modifications. Concerning the latter, there is a standard "post-adversarial" model in the literature, in which adversarial modifications come after the noise. Here, we introduce and analyse a "pre-adversarial" model in which adversarial modifications come before the noise. Given an input coming from such a semi-adversarial generative model, the goal is to reconstruct almost perfectly and with high probability the latent clustering. We focus on the case where the hidden clusters have nearly equal size and show the following. In the pre-adversarial setting, spectral algorithms are optimal, in the sense that they reconstruct all the way to the information-theoretic threshold beyond which no reconstruction is possible. This is in contrast to the post-adversarial setting, in which their ability to restore the hidden clusters stops before the threshold, but the gap is optimally filled by SDP-based algorithms. These results highlight a heretofore unknown robustness of spectral algorithms, showing them less brittle than previously thought.
翻译:我们研究这一问题的重建版本, 试图重建被随机噪音和对抗性修改所腐蚀的潜在集群。 关于后者, 文献中有一个标准的“ 后对抗性” 模型, 对抗性修改在噪音之后出现。 在这里, 我们引入并分析一个“ 对抗前” 模型, 对抗性修改在噪音之前出现。 鉴于这种半对抗性变异模型的输入, 目标是重建几乎完美且极有可能的潜在集群。 我们侧重于隐藏的集群的大小几乎相等并显示以下内容的情况。 在对抗前的设置中, 光谱算法是最佳的, 也就是说它们可以重建到信息理论门槛, 超过这个门槛是不可能重建的。 这与后对抗性环境形成对照, 后者在临界点之前能够恢复隐藏的集群, 但差距最好由基于 SDP 的算法填补 。 这些结果凸显出远未为人所知的光谱算法的强度, 显示它们比以前想象的要小得多。