Bayesian cross-validation (CV) is a popular method for predictive model assessment that is simple to implement and broadly applicable. A wide range of CV schemes is available for time series applications, including generic leave-one-out (LOO) and K-fold methods, as well as specialized approaches intended to deal with serial dependence such as leave-future-out (LFO), h-block, and hv-block. Existing large-sample results show that both specialized and generic methods are applicable to models of serially-dependent data. However, large sample consistency results overlook the impact of sampling variability on accuracy in finite samples. Moreover, the accuracy of a CV scheme depends on many aspects of the procedure. We show that poor design choices can lead to elevated rates of adverse selection. In this paper, we consider the problem of identifying the regression component of an important class of models of data with serial dependence, autoregressions of order p with q exogenous regressors (ARX(p,q)), under the logarithmic scoring rule. We show that when serial dependence is present, scores computed using the joint (multivariate) density have lower variance and better model selection accuracy than the popular pointwise estimator. In addition, we present a detailed case study of the special case of ARX models with fixed autoregressive structure and variance. For this class, we derive the finite-sample distribution of the CV estimators and the model selection statistic. We conclude with recommendations for practitioners.
翻译:Bayesian交叉校准(CV)是一种常用的预测模型评估方法,易于实施,可广泛适用。对于时间序列应用,可采用多种CV计划,包括通用的一对一(LOO)和K倍方法,以及旨在处理序列依赖性的专门方法,如休假-未来(LFO)、h-区块和hv-区块。现有的大抽样结果显示,专用和通用方法都适用于序列依赖数据模型。然而,大量抽样一致性结果忽略了抽样可变性对有限样本准确性的影响。此外,CV计划的准确性取决于程序的许多方面。我们表明,设计选择不当可能导致不利选择率的上升。在本文件中,我们考虑了确定重要数据类别回归部分的问题,即具有序列依赖性的数据,与 q 外源模型(ARX(p,q) q) 的顺序回归值(ARX,q) 依据对数值评分数规则。我们表明,当存在序列依赖性时,使用联合的(多变式)分类和等级结构进行分数计算时,我们用共同的定序序列结构进行更精确性分析。我们用特定的样本选择的模型,我们比较精确的样本分析。我们得出了本次的样本分析。</s>