Kernel Density Estimation (KDE) is a nonparametric method for estimating the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN subroutine can speed up the evaluation. Furthermore, we provide a C++ implementation with a Python interface that can make use of an arbitrary ANN implementation as a subroutine for KDE evaluation. We show empirically that our implementation outperforms state of the art implementations in all high dimensional datasets we considered, and matches the performance of RS in cases where the ANN yield no gains in performance.
翻译:Kernel Density Estimation (KDE) 是估算密度函数形状的一种非参数性方法, 根据分布分布的一组样本进行 。 最近, 最初作为近邻搜索工具而提出的对地敏感散列, 被显示为可以快速 KDE 数据结构。 但是, 这些方法并没有利用最近的邻居算法的算法中许多其他进步。 我们提出了一个算法, 叫做“ 近邻近邻近邻( DEANN) 的密度估计 ” 。 我们提供了一个C++ 执行界面, 可以将近邻( ANN) 算法用作黑盒子路程来计算一个公正的 KDE 。 其想法是找到对 KDE 有很大贡献的点, 使用 ANN 来精确计算其贡献, 并接近其余的随机取样。 我们提出了一个理论论据, 支持 ANN 子路程能够加快评估。 此外, 我们提供了一个 C+ 执行与 Python 界面, 可以使用任意的 ANN 执行( ANN AN) 作为黑盒子路段子路程 来计算一个不偏向 KDE 的 KDE 。 我们在 KDE 高水平 的运行中进行 的运行测试, 显示我们所有运行的运行中的所有测试, 。