Data structures known as $k$-d trees have numerous applications in scientific computing, particularly in areas of modern statistics and data science such as range search in decision trees, clustering, nearest neighbors search, local regression, and so forth. In this article we present a scalable mechanism to construct $k$-d trees for distributed data, based on approximating medians for each recursive subdivision of the data. We provide theoretical guarantees of the quality of approximation using this approach, along with a simulation study quantifying the accuracy and scalability of our proposed approach in practice.
翻译:在科学计算中,被称为K$-d树的数据结构有许多应用,特别是在现代统计和数据科学领域,如决策树的距离搜索、集群、近邻搜索、本地回归等等。在本条中,我们提出了一个可伸缩的机制,根据数据每个递归分层的近似中位数,为分布数据构建k$-d树。我们用这种方法为近似质量提供理论保证,同时进行模拟研究,量化我们拟议方法在实践中的准确性和可伸缩性。