An important issue in many multivariate regression problems is eliminating candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics have the potential to meet this challenge. In this paper, the strong consistency and the central limit theorem of the KOO statistics are derived under the LD setting and mild distributional assumptions (finite fourth moments) of the errors. These theoretical results lead us to propose a subset selection rule based on the KOO statistics with the bootstrap threshold. Simulation results support our conclusions and demonstrate the selection probabilities by the KOO approach with the bootstrap threshold outperform the methods using Akaike information threshold, Bayesian information threshold and Mallow's C$_p$ threshold. We compare the proposed KOO approach with those based on information threshold to a chemometrics dataset and a yeast cell-cycle dataset, which suggests our proposed method identifies useful models.
翻译:许多多元回归问题的一个重要问题是消除具有零预测向量的候选预测器。在响应和预测量数目众多的大维(LD)情形下,模型选择遇到可伸缩性难题。剔除一个(Knock-one-out,KOO)统计量有潜力应对这一挑战。本文在LD环境下,假设误差的有限四阶动差条件下, 推导了 KOO统计量的强相合性和中心极限定理。这些理论结果引导我们提出基于 KOO统计量的子集选择规则,设置自举阈值。模拟结果证明了我们结论的正确性,并证明使用基于自举阈值的KOO方法的选择概率优于基于阿卡艺信息阈值、贝叶斯信息阈值和马洛$C_p$ 阈值的方法。我们将所提出的KOO方法与基于信息阈值的方法相比较,并应用于一个化学计量学数据集和一个酵母细胞生命周期数据集,结果表明我们提出的方法能够识别有用的模型。