Interaction selection for linear regression models with both continuous and categorical predictors is useful in many fields of modern science, yet very challenging when the number of predictors is relatively large. Existing interaction selection methods focus on finding one optimal model. While attractive properties such as consistency and oracle property have been well established for such methods, they actually may perform poorly in terms of stability for high-dimensional data, and they do not typically deal with categorical predictors. In this paper, we introduce a variable importance based interaction modeling (VIBIM) procedure for learning interactions in a linear regression model with both continuous and categorical predictors. It delivers multiple strong candidate models with high stability and interpretability. Simulation studies demonstrate its good finite sample performance. We apply the VIBIM procedure to a Corona Virus Disease 2019 (COVID-19) data used in Tian et al. (2020) and measure the effects of relevant factors, including transmission control measures on the spread of COVID-19. We show that the VIBIM approach leads to better models in terms of interpretability, stability, reliability and prediction.
翻译:与连续和绝对预测器的线性回归模型的互动选择在许多现代科学领域是有用的,但当预测器数量相对较大时则非常具有挑战性。现有的互动选择方法侧重于寻找一种最佳模型。现有的互动选择方法侧重于寻找一种最佳模型。虽然对于这种方法而言,一致性和甲骨文属性等有吸引力的特性已经很好地确立,但它们实际上在高维数据的稳定性方面可能表现不佳,而且它们通常不与绝对预测器打交道。在本文件中,我们引入基于不同重要性的互动模型程序,以学习线性回归模型中的相互作用,同时使用连续和绝对预测器。它提供多种强大的候选模型,具有高度稳定性和可解释性。模拟研究显示了其良好的有限样本性性性性表现。我们将VIBIM程序应用于天等人(2020年)使用的Corona病毒2019(COVID-19)数据,并衡量相关因素的影响,包括传播控制措施对COVID-19扩散的影响。我们表明,VIBIM方法导致在可解释性、稳定性、可靠性和预测性方面有更好的模型。