We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $\alpha \in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor \alpha m \rfloor$ of which are i.i.d. samples from a distribution $D$ with unknown $k$-sparse mean $\mu$. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates containing a vector $\widehat \mu$ such that $\| \widehat \mu - \mu \|_2$ is small. Prior work had studied the problem of list-decodable mean estimation in the dense setting. In this work, we develop a novel, conceptually simpler technique for list-decodable mean estimation. As the main application of our approach, we provide the first sample and computationally efficient algorithm for list-decodable sparse mean estimation. In particular, for distributions with ``certifiably bounded'' $t$-th moments in $k$-sparse directions and sufficiently light tails, our algorithm achieves error of $(1/\alpha)^{O(1/t)}$ with sample complexity $m = (k\log(n))^{O(t)}/\alpha$ and running time $\mathrm{poly}(mn^t)$. For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $\Theta (\sqrt{\log(1/\alpha)})$ with quasi-polynomial sample and computational complexity. We complement our upper bounds with nearly-matching statistical query and low-degree polynomial testing lower bounds.
翻译:我们研究列表偏差平均估计值的问题。 具体地说, 对于一个参数 $\ alpha\ in (0, 1/2) 美元, 我们得到的值为$\ alpha\ in (0, 1/2) 美元, 我们得到的值为$\ mathb{R ⁇ n $, $\ lforpha m\ rfloor$ 美元, 其中, 美元为 i. d. 样本来自发行量为$, 美元为未知的美元, 平均值为$。 对于其余的点, 我们没有做出假设。 目标是用一个包含矢量 $\ belfhat\ mu$, 美元\ bloadhat\ mu -\ mu $2$。 先前的工作研究了在密度环境中列表可降低平均值的平均值估算问题。 我们开发了一种新颖的、 概念上更简单的计算方法。 作为我们方法的主要应用, 我们为列表- 可辨别的最小平均估算值估算值提供第一个样本和计算有效算算法。 。 特别是, 特别的分发量为 < certal_ ral_ ral_ ral_ lial_ ral_ ral_ ral_ lial_ lialxxxxxxxxxxx