The applications of traditional statistical feature selection methods to high-dimension, low sample-size data often struggle and encounter challenging problems, such as overfitting, curse of dimensionality, computational infeasibility, and strong model assumption. In this paper, we propose a novel two-step nonparametric approach called Deep Feature Screening (DeepFS) that can overcome these problems and identify significant features with high precision for ultra high-dimensional, low-sample-size data. This approach first extracts a low-dimensional representation of input data and then applies feature screening based on multivariate rank distance correlation recently developed by Deb and Sen (2021). This approach combines the strengths of both deep neural networks and feature screening, and thereby has the following appealing features in addition to its ability of handling ultra high-dimensional data with small number of samples: (1) it is model free and distribution free; (2) it can be used for both supervised and unsupervised feature selection; and (3) it is capable of recovering the original input data. The superiority of DeepFS is demonstrated via extensive simulation studies and real data analyses.
翻译:传统统计特征选择方法应用于高差异、低抽样规模数据往往难以解决和遇到挑战性问题,如过度装配、维度诅咒、计算不可行和强强的模型假设。在本文件中,我们建议采用新型的两步非参数性方法,称为深地貌筛选(深地貌筛选),可以克服这些问题,确定超高维、低抽样规模数据的重要特征。这一方法首先提取低维输入数据,然后根据Deb和Sen(2021年)最近开发的多变量级距离相关数据进行特征筛选。这一方法结合了深线网络和特征筛选的优势,因此除了能够用少量样本处理超高维数据外,还具有以下吸引人的特点:(1) 它是免费和免费的模型;(2) 它可用于受监管和不受监控的特征选择;(3) 它能够恢复原始输入数据。DiepFS的优势通过广泛的模拟研究和真实数据分析得到证明。</s>