Ultrahigh dimensional data sets are becoming increasingly prevalent in areas such as bioinformatics, medical imaging, and social network analysis. Sure independent screening of such data is commonly used to analyze such data. Nevertheless, few methods exist for screening for interactions among predictors. Moreover, extant interaction screening methods prove to be highly inaccurate when applied to data sets exhibiting strong interactive effects, but weak marginal effects, on the response. We propose a new interaction screening procedure based on joint cumulants which is not inhibited by such limitations. Under a collection of sensible conditions, we demonstrate that our interaction screening procedure has the strong sure screening property. Four simulations are used to investigate the performance of our method relative to two other interaction screening methods. We also apply a two-stage analysis to a real data example by first employing our proposed method, and then further examining a subset of selected covariates using multifactor dimensionality reduction.
翻译:在生物信息学、医学成像和社会网络分析等领域,超高维数据集越来越普遍。对这些数据进行确实独立的筛选是用来分析这些数据的常用方法。然而,对预测者之间相互作用的筛选方法很少。此外,如果将现有的互动筛选方法应用于对反应具有强大互动效应但微弱边际效应的数据集,则该方法非常不准确。我们建议采用新的互动筛选程序,其基础是联合累积剂,不受这些限制。在一系列合理条件下,我们证明互动筛选程序具有很强的可靠性能。我们使用了四种模拟方法来调查我们的方法相对于另外两种互动筛选方法的性能。我们还对一个真实的数据实例进行了两阶段分析,首先采用我们提议的方法,然后进一步利用多因子维度的减少,对一组选定的共变体进行进一步审查。