In this paper, we propose a sample complexity bound for learning a simplex from noisy samples. A dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown arbitrary simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by an additive Gaussian noise of an arbitrary magnitude. We propose a strategy which outputs a simplex having, with high probability, a total variation distance of $\epsilon + O\left(\mathrm{SNR}^{-1}\right)$ from the true simplex, for any $\epsilon>0$. We prove that to arrive this close to the true simplex, it is sufficient to have $n\ge\tilde{O}\left(K^2/\epsilon^2\right)$ samples. Here, SNR stands for the signal-to-noise ratio which can be viewed as the ratio of the diameter of the simplex to the standard deviation of the noise. Our proofs are based on recent advancements in sample compression techniques, which have already shown promises in deriving tight bounds for density estimation in high-dimensional Gaussian mixture models.
翻译:在本文中, 我们提出一个样本复杂度, 用于从繁杂的样本中学习简单x。 给出了一个大小为$n$n的数据集, 其中包括 $\mathb{R ⁇ K$.d. 从一个未知的任意简单x上统一分布的样本, 以美元计算, 假设样本被任意规模的加加加加高素噪音腐蚀。 我们提出一个战略, 使一个简单x产生, 其概率高, 从真正的简单x( mathrm{ SSNR ⁇ -1 ⁇ right) $ 的总变异距离为$- ex + Oleft( mathrm{SNR ⁇ -1 ⁇ right) $ 。 我们的证据以最近样品压缩技术的进展为基础, 如此接近真正的简单x, 我们证明只要有 $n\ge\ tilde{O ⁇ left( Kä2/\\ epsilon\\\\\ right) $就足够了。 这里, SNRR 代表信号- noise 比率, 这可以被视为简单x 直径与标准偏差与标准的偏差比率之比。 我们的证据根据最近样品压缩技术的进展进展进展, 的进度, 已经显示高维度模型的模型的精确度估计。