In this article, we propose a new model-free feature screening method based on energy distances for ultrahigh-dimensional binary classification problems. Unlike existing methods, the cut-off involved in our procedure is data adaptive. With a high probability, the screened set retains only features after discarding all the noise variables. The proposed screening method is then extended to identify pairs of variables that are marginally undetectable, but have differences in their joint distributions. Finally, we build a classifier which maintains coherence between the proposed feature selection criteria and discrimination method, and also establish its risk consistency. An extensive numerical study with simulated data sets and real benchmark data sets show clear and convincing advantages of our classifier over what currently exists in the literature.
翻译:在本条中,我们提议一种新的无模式特征筛选方法,其依据是超高维二元分类问题的能量距离。与现有方法不同,我们程序中涉及的截断点是数据适应性。在极有可能的情况下,筛选组只保留放弃所有噪音变数之后的特征。然后,拟议的筛选方法扩大,以识别在联合分布上略有不测但有差异的两种变量。最后,我们建立一个分类器,在拟议的特征选择标准与歧视方法之间保持一致性,并确立其风险一致性。用模拟数据集和实际基准数据集进行的广泛数字研究表明,分类器相对于文献中的现有数据而言,具有明确和令人信服的优势。