动态维持内核密度估计数据结构:从实践到理论 (Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory)

Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$, we would like to compute $\frac{1}{n}\sum_{i=1}^{n} f(x_i,y)$ for any query point $y \in \mathbb{R}^d$. Recently, there has been a growing trend of using data structures for efficient KDE. However, the proposed KDE data structures focus on static settings. The robustness of KDE data structures over dynamic changing data distributions is not addressed. In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries. Especially, we provide a theoretical framework of KDE data structures. In our framework, the KDE data structures only require subquadratic spaces. Moreover, our data structure supports the dynamic update of the dataset in sublinear time. Furthermore, we can perform adaptive queries with the potential adversary in sublinear time.

翻译：内核密度估计 (KDE) 是机器学习中一项具有挑战性的任务。问题定义如下 : 鉴于内核函数 $f( x,y) 和一组点 $x_ 1, x_ 2,\cdots, x_n \ subset\ mathbb{R ⁇ d$, 我们想要计算$\frac{ 1\\\\\n\n ⁇ ssumb ⁇ i=1\n} f( x_i,y) 任何查询点 $y, 任何( ex_ i, y) 任何查询点都具有挑战性。最近, 为高效的 KDE 使用数据结构的趋势日益增长。然而, 拟议的 KDE 数据结构以静态设置为重点。 KDE 数据结构对于动态变化的数据分布的稳健性没有得到解决。在这项工作中, 我们的重点是动态地维护 KDE 数据结构, 强于对抗性查询。我们提供了 KDE 数据结构的理论框架。在我们的框架中, KDE 数据结构仅需要次赤道空间。此外, 我们的数据结构支持动态更新和亚线时间查询。