Estimating the kernel mean in a reproducing kernel Hilbert space is a critical component in many kernel learning algorithms. Given a finite sample, the standard estimate of the target kernel mean is the empirical average. Previous works have shown that better estimators can be constructed by shrinkage methods. In this work, we propose to corrupt data examples with noise from known distributions and present a new kernel mean estimator, called the marginalized kernel mean estimator, which estimates kernel mean under the corrupted distribution. Theoretically, we show that the marginalized kernel mean estimator introduces implicit regularization in kernel mean estimation. Empirically, we show on a variety of datasets that the marginalized kernel mean estimator obtains much lower estimation error than the existing estimators.
翻译:在复制的内核 Hilbert 空间中估计内核值是许多内核学习算法中的一个关键组成部分。 在有限的样本中,目标内核值的标准估计是经验平均值。 先前的工程表明,可以用缩水方法构建更好的内核值。 在这项工作中,我们提议用已知分布的噪音来腐蚀数据实例,并推出一个新的内核中值中值测算器,称为边缘化内核中值测算器,它估计内核值在腐败分布下是平均值。理论上,我们显示,边缘化内核中值测算器在内核平均值估算中引入了隐性规范。 随机的是,我们展示了各种数据集,边核内核中值测算器获得的估算误差比现有内核值低得多。