A new method called the aggregated sure independence screening is proposed for the computational challenges in variable selection of interactions when the number of explanatory variables is much higher than the number of observations (i.e., $p\gg n$). In this problem, the two main challenges are the strong hierarchical restriction and the number of candidates for the main effects and interactions. If $n$ is a few hundred and $p$ is ten thousand, then the memory needed for the augmented matrix of the full model is more than $100{\rm GB}$ in size, beyond the memory capacity of a personal computer. This issue can be solved by our proposed method but not by our competitors. Two advantages are that the proposed method can include important interactions even if the related main effects are weak or absent, and it can be combined with an arbitrary variable selection method for interactions. The research addresses the main concern for variable selection of interactions because it makes previous methods applicable to the case when $p$ is extremely large.
翻译:暂无翻译