Classical false discovery rate (FDR) controlling procedures offer strong and interpretable guarantees, while they often lack of flexibility. On the other hand, recent machine learning classification algorithms, as those based on random forests (RF) or neural networks (NN), have great practical performances but lack of interpretation and of theoretical guarantees. In this paper, we make these two meet by introducing a new adaptive novelty detection procedure with FDR control, called AdaDetect. It extends the scope of recent works of multiple testing literature to the high dimensional setting, notably the one in Yang et al. (2021). AdaDetect is shown to both control strongly the FDR and to have a power that mimics the one of the oracle in a specific sense. The interest and validity of our approach is demonstrated with theoretical results, numerical experiments on several benchmark datasets and with an application to astrophysical data. In particular, while AdaDetect can be used in combination with any classifier, it is particularly efficient on real-world datasets with RF, and on images with NN.
翻译:另一方面,最近的机器学习分类算法,如基于随机森林或神经网络的算法(NN),具有很强的实际性能,但缺乏解释和理论保障。在本文中,我们通过采用新的适应性新颖的检测程序,在FDR控制下采用称为Adaseta的FDR控制程序,使这两个方法相会。它把最近多种测试文献的作品的范围扩大到高维环境,特别是杨等人(2021年)的文献。Adaseta被证明既能强有力地控制FDR,又能以特定的意义模仿神器之一。我们的方法的兴趣和有效性表现在理论结果、数个基准数据集的数字实验和对天体物理数据的应用上。特别是,Adaseta可以与任何分类器结合使用,但是在与RF的实时数据集和与NNN的图像上特别有效。