We study the task of learning latent-variable models. An obstacle towards designing efficient algorithms for such models is the necessity of approximating moment tensors of super-constant degree. Motivated by such applications, we develop a general efficient algorithm for implicit moment tensor computation. Our algorithm computes in $\mathrm{poly}(d, k)$ time a succinct approximate description of tensors of the form $M_m=\sum_{i=1}^{k}w_iv_i^{\otimes m}$, for $w_i\in\mathbb{R}_+$--even for $m=\omega(1)$--assuming there exists a polynomial-size arithmetic circuit whose expected output on an appropriate samplable distribution is equal to $M_m$, and whose covariance on this input is bounded. Our framework broadly generalizes the work of~\cite{LL21-opt} which developed an efficient algorithm for the specific moment tensors that arise in clustering mixtures of spherical Gaussians. By leveraging our general algorithm, we obtain the first polynomial-time learners for the following models. * Mixtures of Linear Regressions. We give a $\mathrm{poly}(d, k, 1/\epsilon)$-time algorithm for this task. The previously best algorithm has super-polynomial complexity in $k$. * Learning Mixtures of Spherical Gaussians. We give a $\mathrm{poly}(d, k, 1/\epsilon)$-time density estimation algorithm, under the condition that the means lie in a ball of radius $O(\sqrt{\log k})$. Prior algorithms incur super-polynomial complexity in $k$. We also give a $\mathrm{poly}(d, k, 1/\epsilon)$-time parameter estimation algorithm, under the {\em optimal} mean separation of $\Omega(\log^{1/2}(k/\epsilon))$. * PAC Learning Sums of ReLUs. We give a learner with complexity $\mathrm{poly}(d, k) 2^{\mathrm{poly}(1/\epsilon)}$. This is the first algorithm for this task that runs in $\mathrm{poly}(d, k)$ time for subconstant values of $\epsilon = o_{k, d}(1)$.
翻译:暂无翻译