Many works in signal processing and learning theory operate under the assumption that the underlying model is simple, e.g. that a signal is approximately $k$-Fourier-sparse or that a distribution can be approximated by a mixture model that has at most $k$ components. However the problem of fitting the parameters of such a model becomes more challenging when the frequencies/components are too close together. In this work we introduce new methods for sparsifying sums of exponentials and give various algorithmic applications. First we study Fourier-sparse interpolation without a frequency gap, where Chen et al. gave an algorithm for finding an $\epsilon$-approximate solution which uses $k' = \mbox{poly}(k, \log 1/\epsilon)$ frequencies. Second, we study learning Gaussian mixture models in one dimension without a separation condition. Kernel density estimators give an $\epsilon$-approximation that uses $k' = O(k/\epsilon^2)$ components. These methods both output models that are much more complex than what we started out with. We show how to post-process to reduce the number of frequencies/components down to $k' = \widetilde{O}(k)$, which is optimal up to logarithmic factors. Moreover we give applications to model selection. In particular, we give the first algorithms for approximately (and robustly) determining the number of components in a Gaussian mixture model that work without a separation condition.
翻译:在信号处理和学习理论中,许多在信号处理和学习理论中工作的假设是,基本模型是简单的,例如,一个信号大约是美元-四价-零散,或者一种分配可以被一个最多有美元组成部分的混合模型所近似。然而,当频率/部件过于接近时,如何使这种模型的参数适应起来的问题就更具挑战性。在这项工作中,我们引入了测量指数和各种算法应用程序的新方法。首先,我们研究的是没有频率差的Fourier-sparse 干涉法,其中Chen 等人为寻找美元= epsilon$-apblassy 的近似解决方案提供了一种使用美元=\mbox{poly{poly}(k,\log 1/\ epsilon) 的计算方法。我们研究的是,在一个没有分离条件的维度上学习高尔斯混合混合物模型的新方法。 Kernelemi 估计值为美元=O(k/\ epslon2) 提供了一种算法的计算方法。这两种方法首先将输出模型的精度模型的精度和精度的精度值解到一个比我们开始的精度。