Neyman-Pearson非分裂分类器 (Non-splitting Neyman-Pearson Classifiers)

The Neyman-Pearson (NP) binary classification paradigm constrains the more severe type of error (e.g., the type I error) under a preferred level while minimizing the other (e.g., the type II error). This paradigm is suitable for applications such as severe disease diagnosis, fraud detection, among others. A series of NP classifiers have been developed to guarantee the type I error control with high probability. However, these existing classifiers involve a sample splitting step: a mixture of class 0 and class 1 observations to construct a scoring function and some left-out class 0 observations to construct a threshold. This splitting enables classifier construction built upon independence, but it amounts to insufficient use of data for training and a potentially higher type II error. Leveraging a canonical linear discriminant analysis model, we derive a quantitative CLT for a certain functional of quadratic forms of the inverse of sample and population covariance matrices, and based on this result, develop for the first time NP classifiers without splitting the training sample. Numerical experiments have confirmed the advantages of our new non-splitting parametric strategy.

翻译：Neyman-Pearson(NP)二进制分类范式(NP)限制较严重类型的错误(例如,I型错误),但将其他错误(例如,II型错误)降到最低程度。这一范式适用于严重疾病诊断、欺诈检测等应用。已经开发出一系列NP分类器,以极有可能的方式保证I型错误控制。然而,这些现有分类器涉及一个抽样分级步骤:将0级和1级观测器混合起来,以构建一个评分函数和一些零级的剩余观测器,以构建一个阈值。这种分化使分类器在独立的基础上得以构建,但相当于没有充分使用数据进行训练,而且可能发生更高程度的第二类差错。我们利用了一种罐状线性线性对立分析模型,我们根据这一结果为样品和人口变异矩阵的某些四元形式计算出一个定量的CLT,在不分割培训样本的情况下首次开发NP分类器。数字实验证实了我们新的非分裂性参数战略的优点。

相关内容

线性判别分析

关注 286

线性判别式分析（Linear Discriminant Analysis），简称为LDA。也称为Fisher线性判别（Fisher Linear Discriminant，FLD），是模式识别的经典算法，在1996年由Belhumeur引入模式识别和人工智能领域。
基本思想是将高维的模式样本投影到最佳鉴别矢量空间，以达到抽取分类信息和压缩特征空间维数的效果，投影后保证模式样本在新的子空间有最大的类间距离和最小的类内距离，即模式在该空间中有最佳的可分离性。

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日