With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed to learn an adaptive piecewise constant approximation defined on a binary sequential partition of the underlying domain, where the star discrepancy is adopted to measure the uniformity of particle distribution. However, the calculation of the star discrepancy is NP-hard and it does not satisfy the reflection invariance and rotation invariance either. To this end, we use the mixture discrepancy and the comparison of moments as a replacement of the star discrepancy, leading to the density estimation via mixture discrepancy based sequential partition (DSP-mix) and density estimation via moment-based sequential partition (MSP), respectively. Both DSP-mix and MSP are computationally tractable and exhibit the reflection and rotation invariance. Numerical experiments in reconstructing Beta mixtures, Gaussian mixtures and heavy-tailed Cauchy mixtures up to 30 dimension are conducted, demonstrating that MSP can maintain the same accuracy compared with DSP, while gaining an increase in speed by a factor of two to twenty for large sample size, and DSP-mix can achieve satisfactory accuracy and boost the efficiency in low-dimensional tests ($d \le 6$), but might lose accuracy in high-dimensional problems due to a reduction in partition level.
翻译:为将直方图统计推广至高维情形,基于差异的序列划分密度估计(DSP)方法被提出,该方法通过在定义域上构建二叉序列划分,学习自适应分段常数近似,其中采用星差异度量粒子分布的均匀性。然而,星差异的计算属于NP难问题,且不满足反射不变性与旋转不变性。为此,我们采用混合差异与矩比较分别替代星差异,相应提出了基于混合差异的序列划分密度估计(DSP-mix)与基于矩的序列划分密度估计(MSP)。DSP-mix与MSP均具有计算可行性,并满足反射与旋转不变性。在重构高达30维的Beta混合分布、高斯混合分布及重尾柯西混合分布的数值实验中,结果表明:MSP在保持与DSP相同精度的同时,对于大样本量可获得2至20倍的速度提升;DSP-mix在低维测试($d \le 6$)中能达到满意精度并提升计算效率,但在高维问题中可能因划分层级减少而损失精度。