While shrinkage is essential in high-dimensional settings, its use for low-dimensional regression-based prediction has been debated. It reduces variance, often leading to improved prediction accuracy. However, it also inevitably introduces bias, which may harm two other measures of predictive performance: calibration and coverage of confidence intervals. Much of the criticism stems from the usage of standard shrinkage methods, such as lasso and ridge with a single, cross-validated penalty. Our aim is to show that readily available alternatives can strongly improve predictive performance, in terms of accuracy, calibration or coverage. For linear regression, we use small sample splits of a large, fairly typical epidemiological data set to illustrate this. We show that usage of differential ridge penalties for covariate groups may enhance prediction accuracy, while calibration and coverage benefit from additional shrinkage of the penalties. In the logistic setting, we apply an external simulation to demonstrate that local shrinkage improves calibration with respect to global shrinkage, while providing better prediction accuracy than other solutions, like Firth's correction. The benefits of the alternative shrinkage methods are easily accessible via example implementations using \texttt{mgcv} and \texttt{r-stan}, including the estimation of multiple penalties. A synthetic copy of the large data set is shared for reproducibility.
翻译:虽然在高维环境中缩缩是必要的,但在低维回归性预测中使用的缩缩是十分重要的,但已经辩论过它用于低维回归性预测,它减少了差异,常常导致预测准确度的提高。然而,它也不可避免地引入偏差,这可能会损害另外两种预测性能的衡量标准:校准和信任间隔的覆盖范围。许多批评来自标准缩缩缩方法的使用,例如用单一、交叉校准的罚款,例如拉索和山脊。我们的目的是表明,随时可用的替代方法可以大大改进预测性能,在精确度、校准或覆盖范围方面。对于线性回归,我们使用大规模、相当典型的流行病学数据集的小型样本分割来说明这一点。我们表明,对可变异性群体使用差峰值罚款来提高预测的准确性,而校准和覆盖则得益于额外的处罚。在后勤环境下,我们应用外部模拟来证明,地方缩缩缩缩的校准改善了全球缩的校准,同时提供了比其他解决办法更好的预测准确性,如Firth的校正。对于线性回归方法的好处是很容易通过使用使用使用使用使用使用使用多种标准执行的示例来获取的缩式方法。我们可以通过使用多种可理解的示例,包括高制制成成图的重制制成的模型。