We study universal consistency and convergence rates of simple nearest-neighbor prototype rules for the problem of multiclass classification in metric paces. We first show that a novel data-dependent partitioning rule, named Proto-NN, is universally consistent in any metric space that admits a universally consistent rule. Proto-NN is a significant simplification of OptiNet, a recently proposed compression-based algorithm that, to date, was the only algorithm known to be universally consistent in such a general setting. Practically, Proto-NN is simpler to implement and enjoys reduced computational complexity. We then proceed to study convergence rates of the excess error probability. We first obtain rates for the standard $k$-NN rule under a margin condition and a new generalized-Lipschitz condition. The latter is an extension of a recently proposed modified-Lipschitz condition from $\mathbb R^d$ to metric spaces. Similarly to the modified-Lipschitz condition, the new condition avoids any boundness assumptions on the data distribution. While obtaining rates for Proto-NN is left open, we show that a second prototype rule that hybridizes between $k$-NN and Proto-NN achieves the same rates as $k$-NN while enjoying similar computational advantages as Proto-NN. However, as $k$-NN, this hybrid rule is not consistent in general.
翻译:我们首先研究关于多级分类问题的简单近邻原型规则的普遍一致性和趋同率;我们首先研究多级分类的计量速度问题;我们首先表明,在任何允许普遍一致规则的计量空间中,一个称为Proto-NNN的新的数据依赖分解规则是普遍一致的; Proto-NN是最近提议的压缩法,即OptiNet的大幅度简化,这是迄今为止在这种总体环境下唯一已知的普遍一致的算法; 实际上,Proto-NNN是比较容易执行的,并享有较低的计算复杂性; 我们接着研究超误概率的趋同率; 我们首先在差幅条件下获得标准美元-NNN规则的费率; 新的规则是将最近提出的修正-Lipschitz条件从$\mathbb R ⁇ d 扩大到公制空间; 与经修订的Lipschitz条件类似,新条件避免了数据分配方面的任何约束性假设; 在获得普罗托-NNR的费率时,我们发现,第二个原型规则不是在美元和普罗-NNN国之间实现相同的标准。