Recently, Qiao, Duan, and Cheng~(2019) proposed a distributed nearest-neighbor classification method, in which a massive dataset is split into smaller groups, each processed with a $k$-nearest-neighbor classifier, and the final class label is predicted by a majority vote among these groupwise class labels. This paper shows that the distributed algorithm with $k=1$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions, for both regression and classification problems. Roughly speaking, distributed 1-nearest-neighbor rules with $M$ groups has a performance comparable to standard $\Theta(M)$-nearest-neighbor rules. In the analysis, alternative rules with a refined aggregation method are proposed and shown to attain exact minimax optimal rates.
翻译:最近,Qiao、Duan和Cheng~(2019年)提出了一个分布式近邻分类方法,其中将大型数据集分成小类,每个处理方式为美元近邻分类,最后等级标签由这些群类标签中的多数人投票预测。本文显示,在足够多的群组中,以美元=1美元的分布式算法达到一个最小最大最佳误差率,在回归和分类问题的某些常规条件下达到多倍性对数系数。粗略地说,以美元分组分发的1美元近邻规则的性能相当于标准$@Theta(M)$近邻规则。在分析中,提出并展示了精细化组合方法的替代规则,以达到精确的微小比例。