Selecting influential nonlinear interactive features from ultrahigh dimensional data has been an important task in various fields. However, statistical accuracy and computational feasibility are the two biggest concerns when more than half a million features are collected in practice. Many extant feature screening approaches are either focused on only main effects or heavily rely on heredity structure, hence rendering them ineffective in a scenario presenting strong interactive but weak main effects. In this article, we propose a new interaction screening procedure based on joint cumulant (named JCI-SIS). We show that the proposed procedure has strong sure screening consistency and is theoretically sound to support its performance. Simulation studies designed for both continuous and categorical predictors are performed to demonstrate the versatility and practicability of our JCI-SIS method. We further illustrate the power of JCI-SIS by applying it to screen 27,554,602,881 interaction pairs involving 234,754 single nucleotide polymorphisms (SNPs) for each of the 4,000 subjects collected from polycystic ovary syndrome (PCOS) patients and healthy controls.
翻译:从超高维数据中选择有影响的非线性互动特征是不同领域的一项重要任务,然而,统计准确性和计算可行性是实际收集50多万个特征时的两个最大关切,许多现有特征筛选方法要么只侧重于主要影响,要么严重依赖遗传结构,从而在具有强大互动作用但又薄弱的主要效应的情景下使其无效。在本条中,我们提议采用基于联合累积的新的互动筛选程序(称为JCI-SIS)。我们表明,拟议的程序具有很强的可靠筛选一致性,理论上支持其性能是健全的。为连续和直线预测器设计的模拟研究旨在展示我们JCI-SIS方法的多功能性和不实用性。我们进一步说明JCI-SIS的力量,将它应用到27,554,602,881个互动配对的屏幕上,每对从多细胞卵综合症(PCOS)病人和健康控制中收集的4,000个主题都涉及234,754,500个单核糖多元形态(SNPPs),涉及234。