We establish optimal Statistical Query (SQ) lower bounds for robustly learning certain families of discrete high-dimensional distributions. In particular, we show that no efficient SQ algorithm with access to an $\epsilon$-corrupted binary product distribution can learn its mean within $\ell_2$-error $o(\epsilon \sqrt{\log(1/\epsilon)})$. Similarly, we show that no efficient SQ algorithm with access to an $\epsilon$-corrupted ferromagnetic high-temperature Ising model can learn the model to total variation distance $o(\epsilon \log(1/\epsilon))$. Our SQ lower bounds match the error guarantees of known algorithms for these problems, providing evidence that current upper bounds for these tasks are best possible. At the technical level, we develop a generic SQ lower bound for discrete high-dimensional distributions starting from low dimensional moment matching constructions that we believe will find other applications. Additionally, we introduce new ideas to analyze these moment-matching constructions for discrete univariate distributions.
翻译:我们建立了最佳的统计查询( SQ) 下限, 以便强有力地学习离散高维分布体的某些家族。 特别是, 我们显示, 无法使用 $\ epsilon$( 1/\\ epsilon) 的高效 SQ 算法, 无法在 $@ error$( =2$) 的范围内学习其平均值。 同样, 我们显示, 无法使用 $\ epsilon $- corrupted 铁磁高温分布体模型的高效 SQ 算法, 来学习总变异距离 $(\ epsilon\ log( 1/\ epsilon)) $ 的有效 SQ 算法 。 我们的 SQ 下限匹配了已知这些问题算法的错误保证, 提供了这些任务当前上限的最佳证据 。 在技术层面上, 我们开发了一个通用的 SQ 下限, 用于离散高维分布的常规的 SQ 。 从低维值与我们认为会找到其他应用程序的构造匹配 。 此外, 我们引入新的想法来分析这些时段的离位分配结构 。