Parametric density estimation, for example as Gaussian distribution, is the base of the field of statistics. Machine learning requires inexpensive estimation of much more complex densities, and the basic approach is relatively costly maximum likelihood estimation (MLE). There will be discussed inexpensive density estimation, for example literally fitting a polynomial (or Fourier series) to the sample, which coefficients are calculated by just averaging monomials (or sine/cosine) over the sample. Another discussed basic application is fitting distortion to some standard distribution like Gaussian - analogously to ICA, but additionally allowing to reconstruct the disturbed density. Finally, by using weighted average, it can be also applied for estimation of non-probabilistic densities, like modelling mass distribution, or for various clustering problems by using negative (or complex) weights: fitting a function which sign (or argument) determines clusters. The estimated parameters are approaching the optimal values with error dropping like $1/\sqrt{n}$, where $n$ is the sample size.
翻译:参数密度估计,例如Gaussian分布,是统计领域的基础。机器学习需要更复杂的密度的廉价估计,而基本方法则是相对昂贵的最大可能性估计。将讨论低廉的密度估计,例如,将一个多米(或Fourier系列)与样本完全相配,系数的计算方法只是平均单数(或正弦/正弦)与样本相比。另一个讨论的基本应用是使某些标准分布(如Gaussian - 类似于ICA, 但也允许重建扰动密度)。最后,通过使用加权平均值,也可以应用它来估计非概率密度,如模拟质量分布,或者使用负(或复杂)重量来计算各种聚合问题:设置一个函数,由符号(或参数)决定集群。估计参数接近最佳值时误差会下降1美元/sqrt{n},其中美元是样本大小。