We propose a new model-free feature screening method based on energy distances for ultrahigh-dimensional binary classification problems. Unlike existing methods, the cut-off involved in our procedure is data adaptive. With a high probability, the proposed method retains only relevant features after discarding all the noise variables. The proposed screening method is also extended to identify pairs of variables that are marginally undetectable, but have differences in their joint distributions. Finally, we build a classifier which maintains coherence between the proposed feature selection criteria and discrimination method, and also establish its risk consistency. An extensive numerical study with simulated data sets and real benchmark data sets show clear and convincing advantages of our classifier over the state-of-the-art methods.
翻译:我们建议一种基于超高维二元分类问题的能源距离的新的无模式特征筛选方法。与现有方法不同,我们程序中涉及的截断点是数据适应性。在极有可能的情况下,拟议方法只保留放弃所有噪音变数后的相关特征。拟议的筛选方法还扩展至确定在联合分布上略有不测但有差异的两种变量。最后,我们建立一个分类器,在拟议的特征选择标准与歧视方法之间保持一致性,并确立其风险一致性。用模拟数据集和实际基准数据集进行的广泛数字研究显示,我们的分类器对最新方法具有明确和令人信服的优势。