In this paper, we propose a model-free feature selection method for ultra-high dimensional data with mass features. This is a two phases procedure that we propose to use the fused Kolmogorov filter with the random forest based RFE to remove model limitations and reduce the computational complexity. The method is fully nonparametric and can work with various types of datasets. It has several appealing characteristics, i.e., accuracy, model-free, and computational efficiency, and can be widely used in practical problems, such as multiclass classification, nonparametric regression, and Poisson regression, among others. We show that the proposed method is selection consistent and $L_2$ consistent under weak regularity conditions. We further demonstrate the superior performance of the proposed method over other existing methods by simulations and real data examples.
翻译:在本文中,我们建议对具有质量特征的超高维数据采用无模式特征选择方法,这是我们提议使用带有随机森林型RFE的引信的科尔莫戈洛夫过滤器的两阶段程序,以消除模型限制并减少计算复杂性,这种方法完全不对称,可以使用各种类型的数据集,具有若干吸引人的特性,即准确性、不使用模型和计算效率,可以广泛用于实际问题,如多级分类、非参数回归和普瓦森回归等。我们表明,在常规性薄弱的条件下,拟议方法是一致的,费用为$L_2美元。我们还通过模拟和真实数据实例,进一步表明拟议方法优于其他现有方法。