In this paper we suggest two statistical hypothesis tests for the regression function of binary classification based on conditional kernel mean embeddings. The regression function is a fundamental object in classification as it determines both the Bayes optimal classifier and the misclassification probabilities. A resampling based framework is presented and combined with consistent point estimators of the conditional kernel mean map, in order to construct distribution-free hypothesis tests. These tests are introduced in a flexible manner allowing us to control the exact probability of type I error for any sample size. We also prove that both proposed techniques are consistent under weak statistical assumptions, i.e., the type II error probabilities pointwise converge to zero.
翻译:在本文中,我们建议对基于有条件内核平均嵌入的二进制分类回归功能进行两个统计假设测试。 回归函数是分类的一个基本对象,因为它既决定了贝耶斯最佳分类器,又决定了分类错误的概率。 提供了基于重新抽样的框架,并结合了条件内核平均地图的一致点估测器,以构建无分布式假设测试。 这些测试以灵活的方式引入,使我们能够控制任何样本大小的I型错误的确切概率。 我们还证明,在薄弱的统计假设下,即第二类误差概率可以指向零,这两种拟议技术都是一致的。