We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $\mu$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $\mu$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $\tau$, having an additive $\log(1/\tau)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.
翻译:具体地说,考虑到高维重尾分配中少量腐败的样本,其平均值为$mu美元,保证会稀释,我们的目标是高效地计算出一个准确接近$mu美元且概率高的假设。先前的工作获得了高效的算法,用于对轻尾分配进行稳健的稀薄平均估计。在这项工作中,我们为在轻度假设条件下的重尾分配提供了第一个抽样高效和多元时强健的稀有平均估计器。我们的算法实现了最佳的无序误差,使用了一些标度与环境维度对照的样本。重要的是,我们方法的抽样复杂度是最佳的,因为故障概率为$&tau美元,具有添加值 $(1/\tau)美元(tau)美元依赖性。我们的算法利用了从算法强的统计文献中得出的基于稳定性的方法,并具有关键(和必要的)适应性。我们的算法在设定中,我们的分析可能具有某种不稳健的基质的基质性。