Generalized additive partial linear models (GAPLMs) are appealing for model interpretation and prediction. However, for GAPLMs, the covariates and the degree of smoothing in the nonparametric parts are often difficult to determine in practice. To address this model selection uncertainty issue, we develop a computationally feasible model averaging (MA) procedure. The model weights are data-driven and selected based on multifold cross-validation (CV) (instead of leave-one-out) for computational saving. When all the candidate models are misspecified, we show that the proposed MA estimator for GAPLMs is asymptotically optimal in the sense of achieving the lowest possible Kullback-Leibler loss. In the other scenario where the candidate model set contains at least one correct model, the weights chosen by the multifold CV are asymptotically concentrated on the correct models. As a by-product, we propose a variable importance measure to quantify the importances of the predictors in GAPLMs based on the MA weights. It is shown to be able to asymptotically identify the variables in the true model. Moreover, when the number of candidate models is very large, a model screening method is provided. Numerical experiments show the superiority of the proposed MA method over some existing model averaging and selection methods.
翻译:一般添加式部分线性模型(GAPLM)需要模型解释和预测,但对于GAPLMS来说,对于GAPLMS, 共变数和非参数部分的平滑程度在实践中往往难以确定。为了解决模型选择不确定性问题,我们开发了一个计算上可行的平均(MA)程序的模型。模型的权重是数据驱动的,以多倍交叉校验(CV)(而不是放出一出一出)为基础选择计算储蓄。当所有候选模型被错误指定时,我们显示,拟议的GAPLMS的MA 估测器在达到尽可能最低的 Kullback-Leibell损失的意义上是初步最佳的。在另一个假设中,候选模型至少包含一个正确的模型,而多倍CV所选择的权重则不那么集中于正确的模型。作为一个副产品,我们提出了一个可变的重要性计量GAPLMS中预测器的重要性的措施。我们证明,对于实现最小的MAPLMS估计器损失来说,在达到最低的意义上是最佳的最佳选择。在模型中提供某种模型的模型时,在真实的模型上确定一个模型的高级选择方法。