Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory which assumes a fixed set of covariates in the model. We review two interpretations of inference after selection: the full model view, in which the parameters of interest are those of the full model on all predictors, and then focus on the submodel view, in which the parameters of interest are those of the selected model only. In the context of L1-penalized regression we compare proposals for submodel inference (selective inference) via confidence intervals available to applied researchers via software packages using a simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. Our findings indicate that the frequentist properties of selective confidence intervals are generally acceptable, but desired coverage levels are not guaranteed in all scenarios except for the most conservative methods. The choice of inference method potentially has a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. Currently available software packages are not yet very user friendly or robust which might affect their use in practice. In summary, we find submodel inference after selection useful for experienced statisticians to assess the importance of individual selected predictors in future applications.
翻译:回归模型的变量选择在生物医学数据分析中发挥着关键作用。然而,选择后推论没有在模型中假定一套固定的共变体的典型统计常客理论中涵盖。我们审查了选择后推论的两种解释:完整的模型视图,其中感兴趣的参数是所有预测器的完整模型参数,然后侧重于子模型视图,其中感兴趣的参数仅是选定模型的参数。在L1 - - - - - - - - - - - - - - - - - - - - - - - - -情况下,我们通过利用生物医学研究中常见的真实数据所启发的模拟研究,通过软件包对应用研究人员可得到的信任间隔对子模型推论(选择性推论)的建议进行比较。此外,我们对这些方法的模拟研究作了两种解释:在公开的数据集中示范性地运用这些方法,以讨论其实际的可用性。我们的调查结果表明,选择性信任间隔期的常见性特性一般是可以接受的,但除了最保守的方法外,所有设想的覆盖度并不保证。选择推论方法可能会对由此产生的间隔估计产生很大影响,因此,用户必须清楚地了解其未来选择方法的重要性,从而可以对用户作出可靠的选择,从而推断,从而推断,从而在目前可以评估后,从而推断,从而推断,从而推断出其选择后,从而在选择后,从而推断出其为我们可以使用。