Abstract We present PAC-Bayesian bounds for the generalisation error of the K-nearest-neighbour classifier (K-NN). This is achieved by casting the K-NN classifier into a kernel space framework in the limit of vanishing kernel bandwidth. We establish a relation between prior measures over the coefficients in the kernel expansion and the induced measure on the weight vectors in kernel space. Defining a sparse prior over the coefficients allows the application of a PAC-Bayesian folk theorem that leads to a generalisation bound that is a function of the number of redundant training examples: those that can be left out without changing the solution. The presented bound requires to quantify a prior belief in the sparseness of the solution and is evaluated after learning when the actual redundancy level is known. Even for small sample size (m ~ 100) the bound gives non-trivial results when both the expected sparseness and the actual redundancy are high.
翻译:我们为 K- 近邻分类器( K- NN) 的概括错误提供了PAC- Bayesian 边框。 将 K- NN 分类器丢入内核空间框架, 以消失内核带宽的限度为限。 我们确立了内核扩展系数的先前措施与内核空间中重量矢量的诱导措施之间的关系。 在系数之前的稀疏定义允许应用一个PAC- Bayesian民俗理论, 从而导致一个与冗余训练实例数量有关的概括性约束: 那些可以不改变解决办法而被丢弃的。 所提出的约束要求量化对解决方案稀疏的先前信念, 并在了解实际冗余程度后进行评估。 即使对于小样本大小( m~ 100), 当预期的稀少程度和实际冗余程度都很高时, 边框也会产生非三角结果 。