使用压缩封面树搜索 K 最近邻居的新的近线时间算法 (A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree)

Given a reference set $R$ of $n$ points and a query set $Q$ of $m$ points in a metric space, this paper studies an important problem of finding $k$-nearest neighbors of every point $q \in Q$ in the set $R$ in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that that its construction requires $O(n\log n)$ time while the nearest neighbor search needs $O(n\log m)$ time with a hidden dimensionality factor. In 2015, section~5.3 of Curtin's PhD thesis pointed out that the proof of the latter claim might contain a serious gap in time estimation. In the paper at TopoInVis 2022, the authors built explicit counterexamples for a key step in the proofs of both claims. The past obstacles will be overcome by a simpler compressed cover tree on the reference set $R$. The first new algorithm constructs a compressed cover tree in $O(n \log n)$ time. The second new algorithm finds all $k$-nearest neighbors of all points from $Q$ using a compressed cover tree in time $O(m(k+\log n)\log k)$ with a hidden dimensionality factor depending on point distributions of the sets $R,Q$ but not on their sizes.

翻译：鉴于参考值设定了美元点数和查询设定了美元点数,本文研究了一个重要问题,即在近线时间里找到每个点点的近邻$Q$$美元,这是在近线时间里找到每个点的近邻$Q美元的一个重要问题。在2006年ICML、Beygelzimer、Kakade和Langford的论文中,作者为这两项索赔的证据引入了一个封面树,并试图证明其构造需要花费O(n\log n)美元的时间,而最近的邻居搜索需要花费O(n\log m)美元的时间以隐藏的维度系数为美元。2015年,Curtin博士博士论文第~5.3节指出,后一项索赔的证据可能包含一个严重的时间估计差距。在2022年的TopoInvis的论文中,作者为这两项索赔的证据中的一个关键步骤建立了明确的反抽样。如果在参考值设定的$R美元上用更简单的压缩的封面树将克服过去的障碍。首个新的算法将用$n-nlog$的美元比例构建一个压缩封面树(n n_xxxxxxxx) 时间里所有新的算的硬基的美元。