We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models. Namely, we are given i.i.d. samples from a pdf $f$ where $$ f=\sum_{i=1}^k w_i f_i, \quad\sum_{i=1}^k w_i=1, \quad w_i>0 $$ and we are interested in learning each component $f_i$. Without any assumptions on $f_i$, this problem is ill-posed. In order to identify the components $f_i$, we assume that each $f_i$ can be written as a convolution of a Gaussian and a compactly supported density $\nu_i$ with $\text{supp}(\nu_i)\cap \text{supp}(\nu_j)=\emptyset$. Our main result shows that $(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$ samples are required for estimating each $f_i$. Unlike parametric mixtures, the difficulty does not arise from the order $k$ or small weights $w_i$, and unlike nonparametric density estimation it does not arise from the curse of dimensionality, irregularity, or inhomogeneity. The proof relies on a fast rate for approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses $(\frac{1}{\varepsilon})^{O(\log\log \frac{1}{\varepsilon})}$ samples to estimate each $f_i$. Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions. Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.
翻译:我们研究在有限混合物中学习非参数分布的问题, 并在样本复杂性上建立严格的界限, 以学习这些模型中组件的分布。 也就是说, 我们得到的样本来自 pdf $f$, 其中美元=sum ⁇ i=1 ⁇ k w_i f_i, kqad\sum ⁇ i=1 ⁇ k w_i=1, \qual_ i>0, 我们有兴趣学习每个组件 $f_ i$。 没有对 $ 的假设, 这个问题是不正确的。 为了确定 $f_ i, 我们得到的样本中, 美元=====1美元, 美元=1 k_i=1 f_i=1, =1, quald=quality =xlational_ commissionality, =xlational_ maisqual_ maxyral_ modeal_ maxyral_ modeal_ discodeal_ dies a $_ dismax dismax, (n_ nu_ nu_i) max_i) roismax roismax roismismax roismism) y roismismismismismismismismismismismismismismus roismismismismismismismismismismismismismismismismismismismismismismismismism y y y y y y y y_i) ( y) ( y) ( y) y) y) y)。 y ro ro y y)。 y)。 ro ro y y y yal yal y y y y y y ro y y y y y y y y y y y y y y y y y y y y y y y y y y