This paper is motivated by the k-nearest neighbors search: given an arbitrary metric space, and its finite subsets (a reference set R and a query set Q), design a fast algorithm to find all k-nearest neighbors in R for every point q in Q. In 2006, Beygelzimer, Kakade, and Langford introduced cover trees to justify a near-linear time complexity for the neighbor search in the sizes of Q,R. Section 5.3 of Curtin's PhD (2015) pointed out that the proof of this result was wrong. The key step in the original proof attempted to show that the number of iterations can be estimated by multiplying the length of the longest root-to-leaf path in a cover tree by a constant factor. However, this estimate can miss many potential nodes in several branches of a cover tree, that should be considered during the neighbor search. The same argument was unfortunately repeated in several subsequent papers using cover trees from 2006. This paper explicitly constructs challenging datasets that provide counterexamples to the past proofs of time complexity for the cover tree construction, the k-nearest neighbor search presented at ICML 2006, and the dual-tree search algorithm published in NIPS 2009. The corrected near-linear time complexities with extra parameters are proved in another forthcoming paper by using a new compressed cover tree simplifying the original tree structure.
翻译:本文的动因是K- 近邻搜索: 给一个任意的衡量空间, 及其有限的子集( 参考集 R 和查询集 Q ), 设计一个快速算法, 在 R 中找到所有 K- 近邻。 2006 年, Beygelzimer、 Kakade 和 Langford 引入了覆盖树, 以证明邻居搜索Q, R. Curttin's 博士(2015) 第5.3节的近线性时间复杂性, 指出这一结果的证明是错误的。 原始证据中的关键步骤是试图通过将封面树中最长的根对叶路径的长度乘以一个不变系数来估计迭代数。 但是, 这一估计可能会错漏了封面树若干分支中的许多潜在节点, 这一点应该在邻居搜索中加以考虑。 不幸的是, 2006 年 Curttin's 博士的博士博士博士第5.3节指出, 这一结果的证明是错误的。 本文明确构建了具有挑战性的数据集, 提供了过去树盖树构造时间复杂性的证据, K- Near- Leferal 在2009 中, 将使用新的 IM Restrial im recolimal Clas 进行第二次搜索搜索 。