Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, the majority of existing feature screening approaches are either restricted to a univariate response or rely on some distribution or model assumptions. In this article, we propose a novel sure independence screening approach based on the multivariate rank distance correlation (MrDc-SIS). The MrDc-SIS achieves multiple desirable properties such as being distribution-free, completely nonparametric, scale-free, robust for outliers or heavy tails, and sensitive for hidden structures. Moreover, the MrDc-SIS can be used to screen either univariate or multivariate responses and either one dimensional or multi-dimensional predictors. We establish the asymptotic sure screening consistency property of the MrDc-SIS under a mild condition by lifting previous assumptions about the finite moments. Simulation studies demonstrate that MrDc-SIS outperforms three other closely relevant approaches under various settings. We also apply the MrDc-SIS approach to a multi-omics ovarian carcinoma data downloaded from The Cancer Genome Atlas (TCGA).
翻译:特征筛选方法在从超高维度和日益复杂的数据中选择积极特征方面是有效的;然而,大多数现有特征筛选方法要么局限于单向反应,要么依赖某种分布或模型假设;在本条中,我们提议根据多变量级距离相关关系(MrDc-SIS)采取新的可靠独立筛选方法;MrDc-SIS具有多种可取的特性,如无分布、完全不对等、无尺度、无尺度、对外向或重尾巴保持稳健、对隐藏结构敏感;此外,Dc-SIS先生可以用来筛选单向或多变量反应以及单维或多维预测器;我们通过取消以前对有限时刻的假设,在较轻的条件下确定Dc-SIS先生的无症状可靠检查属性;模拟研究表明,Dc-SISI先生在各种环境下都比其他三种密切相关的方法高。我们还将Dc-SIS先生的方法应用于从癌症基因组Atlas(TCGA)下载的多组合卵巢癌数据。