The selection of smoothing parameter is central to estimation of penalized splines. The best parameter value is often the one that optimizes a smoothness selection criterion, like the minimizer of generalized cross-validation error (GCV) and the maximizer of restricted likelihood (REML). To avoid ending up with an undesired local extremum rather than the global extremum, grid search should be used for optimization. Unfortunately, the method requires a pre-specified search interval that contains the unknown global extremum and there has not been any theory on how it could be provided. As a result, practitioners have to find it by trial and error. To overcome such difficulty, we develop novel algorithms to automatically find this interval. Our automatic search interval has four advantages. (i) It specifies a smoothing parameter range where the penalized least squares problem is numerically solvable. (ii) It is criterion-independent, so that different criteria like GCV and REML can be explored on the same parameter range. (iii) It is sufficiently wide to contain the global extremum of any criterion, so that for example, the global minimum of GCV and the global maximum of REML can both be identified. (iv) It is computationally cheap compared with grid search so that it carries no extra costs in practice. Our method is ready to use through R package gps (>= version 1.1). It may be embedded in other advanced statistical modeling methods that rely on penalized splines.
翻译:光滑参数的选择是估算受罚的样条的核心。 最优的参数值通常是优化光滑选择标准的最佳标准, 比如, 降低通用交叉校验错误( GCV) 和限制概率最大化( REML) 。 为了避免最终出现不理想的本地extremum, 而不是全球 extremum, 网格搜索应该用于优化。 不幸的是, 这种方法需要一个预设的搜索间隔, 包含未知的全局外线, 并且对于如何提供该间隔没有任何理论。 因此, 从业人员必须尝试和错误来找到它。 为了克服这种困难, 我们开发新的算法可以自动找到这个间隔。 我们的自动搜索间隔有四个优势 。 (i) 它指定了一个平滑的参数范围, 受罚的最小方块问题从数字上可以缓解 。 (ii) 它取决于标准, 因此可以在同一参数范围内探索不同的标准, 如 GCV 和 REML 。 (iii) 它足够宽, 能够包含任何标准的全局外框, 因此, 将全球最低的 GMLV 和我们最廉价的计算方法 。