Variable selection has become a pivotal choice in data analyses that impacts subsequent inference and prediction. In linear models, variable selection using Second-Generation P-Values (SGPV) has been shown to be as good as any other algorithm available to researchers. Here we extend the idea of Penalized Regression with Second-Generation P-Values (ProSGPV) to the generalized linear model (GLM) and Cox regression settings. The proposed ProSGPV extension is largely free of tuning parameters, adaptable to various regularization schemes and null bound specifications, and is computationally fast. Like in the linear case, it excels in support recovery and parameter estimation while maintaining strong prediction performance. The algorithm also preforms as well as its competitors in the high dimensional setting (n>p). Slight modifications of the algorithm improve its performance when data are highly correlated or when signals are dense. This work significantly strengthens the case for the ProSGPV approach to variable selection.
翻译:在数据分析中,变量选择已成为影响随后的推断和预测的关键选择。在线性模型中,使用第二光学 P-Values (SGPV) 的变量选择与研究人员可利用的任何其他算法一样好。在这里,我们将第二光学 P-Vales (ProSGPV) 的“惩罚性回归”概念扩大到通用线性模型(GLM) 和 Cox 回归设置。 拟议的ProSGPV 扩展基本没有调试参数,可适应各种正规化计划和无约束规格,而且计算速度很快。 与线性模型一样,它支持恢复和参数估算,同时保持强劲的预测性能。 算法还预示了高维度设置(n>p) 的对手。 当数据高度关联或信号密集时,对算法的简单修改会改善它的性能。 这项工作大大加强了ProSGPV 方法对变量选择的论证。