Increasingly high-dimensional data sets require that estimation methods do not only satisfy statistical guarantees but also remain computationally feasible. In this context, we consider $ L^{2} $-boosting via orthogonal matching pursuit in a high-dimensional linear model and analyze a data-driven early stopping time $ \tau $ of the algorithm, which is sequential in the sense that its computation is based on the first $ \tau $ iterations only. This approach is much less costly than established model selection criteria, that require the computation of the full boosting path. We prove that sequential early stopping preserves statistical optimality in this setting in terms of a fully general oracle inequality for the empirical risk and recently established optimal convergence rates for the population risk. Finally, an extensive simulation study shows that at an immensely reduced computational cost, the performance of these type of methods is on par with other state of the art algorithms such as the cross-validated Lasso or model selection via a high dimensional Akaike criterion based on the full boosting path.
翻译:日益高维的数据集要求估算方法不仅满足统计保障,而且仍然在计算上可行。 在这方面,我们认为,在高维线性模型中,通过正方对齐追逐以正方形匹配启动$L ⁇ 2}美元,并分析由数据驱动的算法早期中断时间$\toau美元,因为算法的顺序是,其计算仅以第一个 $ = tau 的迭代为基础。这个方法比既定的模型选择标准成本低得多,需要计算全振动路径。我们证明,从实证风险的全面普遍不平等和最近为人口风险确定的最佳趋同率的角度来说,早期停止使用$ $2} 。 最后,一个广泛的模拟研究表明,在计算成本大幅降低的情况下,这些方法的性能与艺术算法的其他状态相同,例如交叉验证的拉索或模型选择,以全振动路径为基础的高维的阿卡伊特标准。