The selection of smoothing parameter is central to the estimation of penalized splines. The best value of the smoothing parameter is often the one that optimizes a smoothness selection criterion, such as generalized cross-validation error (GCV) and restricted likelihood (REML). To correctly identify the global optimum rather than being trapped in an undesired local optimum, grid search is recommended for optimization. Unfortunately, the grid search method requires a pre-specified search interval that contains the unknown global optimum, yet no guideline is available for providing this interval. As a result, practitioners have to find it by trial and error. To overcome such difficulty, we develop novel algorithms to automatically find this interval. Our automatic search interval has four advantages. (i) It specifies a smoothing parameter range where the associated penalized least squares problem is numerically solvable. (ii) It is criterion-independent so that different criteria, such as GCV and REML, can be explored on the same parameter range. (iii) It is sufficiently wide to contain the global optimum of any criterion, so that for example, the global minimum of GCV and the global maximum of REML can both be identified. (iv) It is computationally cheap compared with the grid search itself, carrying no extra computational burden in practice. Our method is ready to use through our recently developed R package gps (>= version 1.1). It may be embedded in more advanced statistical modeling methods that rely on penalized splines.
翻译:光滑参数的选择是估算受罚的样条的核心。 光滑参数的最佳值往往是优化光滑选择标准的最佳值, 如通用交叉校验错误(GCV) 和限制可能性(REML ) 。 要正确识别全球最佳值而不是被困在不理想的本地最佳值中, 建议优化网格搜索。 不幸的是, 网格搜索方法需要一个预先指定的搜索间隔, 包含未知的全球最佳值, 但是没有提供这一间隔的指南。 因此, 光滑参数的最大值往往在于优化光滑选择标准, 如通用交叉校验错误(GCV) 和限制可能性(REML ) 。 为了克服这种困难, 我们的自动搜索间隔有四个优点 。 (i) 它指定了一个平滑的参数范围, 与之相关的受罚最小方块问题从数字上可以解开来。 (ii) 它取决于标准, 因此, 能够在同一模型参数范围内探索不同的标准, 如 。 (iii) 它足够广泛, 包含任何标准的全球最佳性标准, 例如, 我们开发的 GCV 最低值以及全球最廉价的 REML 方法 。 最近的升级的计算方法 。