Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.
翻译:以模型为基础的元件梯度推进是数据驱动变量选择的流行工具。为了进一步提高其预测和选择质量,已经对原始算法进行了几项修改,主要侧重于不同的停止标准,使实际变量选择机制不受影响。我们对基于模型的可变选择步骤的不同预测机制进行了调查,这些方法包括Akaikes信息标准以及依赖通过交叉校验计算到的元件测试错误的甄选规则。我们实施了通用线性模型的AIC和交叉验证常规线性常规程序,并评估了它们可变的选择特性和预测性能。一项广泛的模拟研究揭示了选择特性的改善,而预测错误在以年龄标准化的COVID-19发生率实际应用中可以降低。</s>