In nearest-neighbor classification problems, a set of $d$-dimensional training points are given, each with a known classification, and are used to infer unknown classifications of other points by using the same classification as the nearest training point. A training point is relevant if its omission from the training set would change the outcome of some of these inferences. We provide a simple algorithm for thinning a training set down to its subset of relevant points, using as subroutines algorithms for finding the minimum spanning tree of a set of points and for finding the extreme points (convex hull vertices) of a set of points. The time bounds for our algorithm, in any constant dimension $d\ge 3$, improve on a previous algorithm for the same problem by Clarkson (FOCS 1994).
翻译:在近邻的分类问题中,提供了一套有已知分类的维维数训练点,这些训练点用来通过使用与最近的训练点相同的分类来推断其他点的未知分类。如果训练点的遗漏会改变部分推断结果,那么训练点就具有相关性。我们提供了简单的算法,用于将训练点缩减到其相关点的子组,使用亚常规算法来寻找一组点的最低横线,并查找一组点的极端点(电离层船体顶)。我们算法的任何常数的时序值为3美元,用克拉克森(FOCS 1994年)改进以前对同一问题的算法。