Cross-validation is the standard approach for tuning parameter selection in many non-parametric regression problems. However its use is less common in change-point regression, perhaps as its prediction error-based criterion may appear to permit small spurious changes and hence be less well-suited to estimation of the number and location of change-points. We show that in fact the problems of cross-validation with squared error loss are more severe and can lead to systematic under- or over-estimation of the number of change-points, and highly suboptimal estimation of the mean function in simple settings where changes are easily detectable. We propose two simple approaches to remedy these issues, the first involving the use of absolute error rather than squared error loss, and the second involving modifying the holdout sets used. For the latter, we provide conditions that permit consistent estimation of the number of change-points for a general change-point estimation procedure. We show these conditions are satisfied for optimal partitioning using new results on its performance when supplied with the incorrect number of change-points. Numerical experiments show that the absolute error approach in particular is competitive with common change-point methods using classical tuning parameter choices when error distributions are well-specified, but can substantially outperform these in misspecified models. An implementation of our methodology is available in the R package crossvalidationCP on CRAN.
翻译:交叉校准是在许多非参数回归问题中调适参数选择的标准方法。但是,在变化点回归中,使用这种标准并不常见,因为预测误差标准似乎允许小小的虚假变化,因此更不适合估计变更点的数量和位置。我们表明,事实上,对平差损失的交叉校准问题更为严重,可能导致对变化点数目的系统估计不足或过高,在容易发现变化的简单环境中,对中值函数的估算极不优化。我们建议了两种简单的方法来纠正这些问题,一种是使用绝对错误而不是平方误差损失,第二种是修改所用拖延点数和位置。对于后者,我们提供了条件,允许对一般变更点估计程序的变化点数进行一致估计,并可能导致对变化点数提供的新结果进行最优化的估测。在提供变化点时,量化实验表明,绝对误差方法是绝对误差方法,特别是使用绝对误差而不是平方差损失损失,而第二个是采用通用的正则模式,在采用这些通用的RMR-A模型时,这些模式是具有竞争力的。