Given $iid$ observations from an unknown absolute continuous distribution defined on some domain $\Omega$, we propose a nonparametric method to learn a piecewise constant function to approximate the underlying probability density function. Our density estimate is a piecewise constant function defined on a binary partition of $\Omega$. The key ingredient of the algorithm is to use discrepancy, a concept originates from Quasi Monte Carlo analysis, to control the partition process. The resulting algorithm is simple, efficient, and has a provable convergence rate. We empirically demonstrate its efficiency as a density estimation method. We present its applications on a wide range of tasks, including finding good initializations for k-means.
翻译:鉴于在某个域上定义的未知的绝对连续分布的观测结果($\Omega$),我们提出一种非参数方法,以学习一个小数常数函数来接近潜在概率值函数。我们的密度估计是一个小数常数函数,在$\Omega$的二进制分割法上定义。算法的关键成分是使用差异,一个概念起源于Qasi Monte Carlo分析,以控制分割过程。由此产生的算法简单、高效,并具有可辨别的趋同率。我们用经验证明它作为密度估计方法的效率。我们将其应用于一系列广泛的任务,包括寻找K手段的良好初始化。