This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of $n$ points $A,B$ in $\mathbb{R}^d$, a metric $\phi(\cdot)$, and a distance threshold $r > 0$, report all pairs of points $(a, b) \in A \times B$ with $\phi(a,b) \le r$. Our goal is to store $A,B$ into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from $A$ or $B$. We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for $\ell_1, \ell_\infty$ metrics with $\log^{O(1)} n$ update time and delay. We show that such a data structure is not feasible for the $\ell_2$ metric for $d \ge 4$. For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for $\ell_p$ metric, with $\log^{O(1)} n$ delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).
翻译:本文在动态更新中考虑类似join询问的解答 : 鉴于两套 $n$ $A,B$ 美元以$mathb{R ⁇ d$ 美元计算,一个公吨 $phi(cdot) 美元和一个距离阈值 $ > 0, 以$(a,b)\ times B$ 以$\phi(a,b)\le r$报告所有双点 B$ 。 我们的目标是将 $A,B$存储到一个动态数据结构中, 该结构可以以最坏的延迟保证方式列出所有结果对配对, 也就是说, 以美元=log_O\\ 美元计算连续对对对对对调的时间是捆绑在一起的。 此外, 当点插入或从$B$(a) 美元或$B$美元中删除时, 数据结构可以有效更新。 我们提出一些有效的数据结构来回答相似的查询。 对于精确的引用, 我们为$\ ell_1,\\,\ ell_in net nical nitude $ ladeal lade a ladeal ladeal lade a lade dal lax a lax a lax lax a lax a lax a lax nd a lax nd lax lax lax a ladeal_ lade a latical_ lax