In nearest-neighbor classification, a training set $P$ of points in $\mathbb{R}^d$ with given classification is used to classify every point in $\mathbb{R}^d$: Every point gets the same classification as its nearest neighbor in $P$. Recently, Eppstein [SOSA'22] developed an algorithm to detect the relevant training points, those points $p\in P$, such that $P$ and $P\setminus\{p\}$ induce different classifications. We investigate the problem of finding the minimum cardinality reduced training set $P'\subseteq P$ such that $P$ and $P'$ induce the same classification. We show that the set of relevant points is such a minimum cardinality reduced training set if $P$ is in general position. Furthermore, we show that finding a minimum cardinality reduced training set for possibly degenerate $P$ is in P for $d=1$, and NP-complete for $d\geq 2$.
翻译:在最近的邻里分类中,使用按给定分类的每分点数设定的培训费为美元,每分点数按美元计算:每分点的分类与最近的邻居的分类相同,以美元计算。最近,Eppstein [SOSA'22] 开发了一种算法,以检测相关的培训点数,这些点数按美元计算,例如P$和$P\setminus ⁇ ⁇ 美元引起不同的分类。我们调查了找到最低基本程度减少培训额为$P\subseq P$的问题,例如,美元和$P'美元,以美元和$P'美元进行同样的分类。我们表明,如果P$处于一般状况,那么相关的一组点数是最低基本基本程度减少的培训点数。此外,我们表明,为可能退化的P美元找到最低限度的降低主要程度的培训点数是P=1美元,而NP=2美元完成NP。