Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering d-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the class of low-degree polynomial methods and the sum-of-squares (SoS) hierarchy for recovering certain hidden structures planted in Gaussian clustering instances. Prior work on many similar inference tasks portends that such lower bounds strongly suggest the presence of an inherent statistical-to-computational gap for clustering, that is, a parameter regime where the clustering task is statistically possible but no polynomial-time algorithm succeeds. One special case of the clustering task we consider is equivalent to the problem of finding a planted hypercube vector in an otherwise random subspace. We show that, perhaps surprisingly, this particular clustering model does not exhibit a statistical-to-computational gap, even though the aforementioned low-degree and SoS lower bounds continue to apply in this case. To achieve this, we give a polynomial-time algorithm based on the Lenstra--Lenstra--Lovasz lattice basis reduction method which achieves the statistically-optimal sample complexity of d+1 samples. This result extends the class of problems whose conjectured statistical-to-computational gaps can be "closed" by "brittle" polynomial-time algorithms, highlighting the crucial but subtle role of noise in the onset of statistical-to-computational gaps.
翻译:集成是未经监督的学习的基本原始, 从而产生大量在计算上质疑的计算性测算任务。 在这项工作中, 我们侧重于将 d- 维高斯 混合物与未知的( 可能退化的) 共变( 未知的) 共变组合的简单任务。 最近的工作( Ghosh 等人 20; Mao, Wein'21; Davis, Diaz, Wang'21) 已经针对低度多元度方法的等级和在高斯群集中安装的某些隐藏结构的方程式( So- S) 等级建立了较低的界限。 在很多类似的推断任务中, 先前的工作显示存在一个内在的统计- 共变异性混合物混合物混合物混合物混合物的组合, 也就是说, 在统计学上可能存在但不会成功但不会成功。 我们所考虑的组合任务的一个特殊案例是, 在一个随机的子空间中找到一个高立超立方体矢量矢量的矢量矢量矢量矢量矢量。 我们显示, 这个特定的类组模型甚至可以延低度的统计级的 直达这个统计- 直判算法基础, 直达直判的 直判法基础。