$k$d-trees are widely used in parallel databases to support efficient neighborhood/similarity queries. Supporting parallel updates to $k$d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a $k$d-tree that allows for efficient parallel $k$-NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of $k$d-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with $n$ points, each batch of $B$ updates takes $O(B\log^2{(n+B)})$ amortized work and $O(\log(n+B)\log\log{(n+B)})$ depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious $k$d-tree construction and parallel bloom filter construction. Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to $34.8\times$ ($28.4\times$ on average) for batch insertions, up to $35.5\times$ ($27.2\times$ on average) for batch deletions, and up to $46.1\times$ ($40.0\times$ on average) for $k$-nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for $k$-NN queries. We compare to two baseline $k$d-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates.
翻译:在平行数据库中广泛使用 $k$-NN 树来支持高效的邻里/相似查询。 因此, 支持平行更新 $k$- dree 是一项重要操作 。 在本文中, 我们展示 BDL- Tree, 一个平行的、 批量性地实施 $k$- NN 树, 允许在动态变化点设置上高效的平行查询 $k$- kd- tree 。 BDL- tree 包含一个日志结构化的 $k- d- tree 集, 可用于高效地插入或删除批次点, 与多元深度平行。 具体地说, 鉴于 BDL- dreal- tree 有美元点点, 每批BDL\ 2 美元 美元 美元 的 美元 美元- dreairal- drealderstalteral