Kernel density estimation (KDE) is one of the most widely used nonparametric density estimation methods. The fact that it is a memory-based method, i.e., it uses the entire training data set for prediction, makes it unsuitable for most current big data applications. Several strategies, such as tree-based or hashing-based estimators, have been proposed to improve the efficiency of the kernel density estimation method. The novel density kernel density estimation method (DMKDE) uses density matrices, a quantum mechanical formalism, and random Fourier features, an explicit kernel approximation, to produce density estimates. This method has its roots in the KDE and can be considered as an approximation method, without its memory-based restriction. In this paper, we systematically evaluate the novel DMKDE algorithm and compare it with other state-of-the-art fast procedures for approximating the kernel density estimation method on different synthetic data sets. Our experimental results show that DMKDE is on par with its competitors for computing density estimates and advantages are shown when performed on high-dimensional data. We have made all the code available as an open source software repository.
翻译:内核密度估计(KDE)是使用最为广泛的非参数性密度估计方法之一。事实上,这是一种基于内存的方法,即它使用整个培训数据集进行预测,因此它不适合大多数当前的大数据应用。一些战略,例如基于树的或基于散射的测算器,已经提出来提高内核密度估计方法的效率。新颖的内核密度估计方法(DMKDE)使用密度矩阵、量子机械形式学和随机傅里叶特征,一个明确的内核近似,来产生密度估计。这种方法在 KDE 中有着它的根,可以被视为一种近似方法,而没有基于内存的限制。在本文件中,我们系统地评价了新型的DMKDE算法,并将其与其他最先进的快速程序进行比较,以适应不同合成数据集的内核密度估计方法。我们的实验结果表明,DMKDE与其计算密度估计的竞争者十分接近,在高维数据上显示其优势。我们把所有代码都用作开放源软件库。