Ensemble methods such as bagging and random forests are ubiquitous in fields ranging from finance to genomics. However, the question of the efficient tuning of ensemble parameters has received relatively little attention. In this paper, we propose a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes of randomized ensembles. Our method builds on two main ingredients: two initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique leveraging the structure of the prediction risk decomposition. By establishing uniform consistency over ensemble and subsample sizes, we show that ECV yields $\delta$-optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As an illustrative example, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests. Compared to sample-split cross-validation and K-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting. Meanwhile, its computational cost is considerably lower owing to the use of the risk extrapolation technique. Further numerical results demonstrate the finite-sample accuracy of ECV for several common ensemble predictors.
翻译:囊括和随机森林等综合方法在从金融到基因组学的各个领域都普遍存在。然而,对混合参数的高效调适问题相对关注较少。在本文中,我们提出一种交叉校准方法,ECV(外加交叉校准),用于调适随机拼凑团团团的组合和子抽样大小。我们的方法基于两个主要因素:两个初始估计器,用于利用包外误差和新颖的风险外推法,用于小团团团体大小小团体的预估器。但是,对混合参数值参数结构进行高效调适的问题相对较少关注。我们通过在堆积和子标码大小上建立统一的一致性,我们表明ECV(外加分数)产生美元-优劣值组合,我们的理论考虑到一般混合预测器,需要简单假设,并允许高维度制度随着抽样规模的增长而发展。作为示例的例子,我们利用ECV(c) 高额的精度精度精确度选选比值,我们使用高额的精度的精度的精度, 将ECV(creal-lalalalalalalalalalalal) exal-al-al-al-al-al-al-al-al-al-al-lical-al-al-al-al-Imal-lational-lational-lational-ligal-ligal-lisal-smal-sal-sal-Ial-Ial-lation-sal-smal-toal-toal-toal-toal-toal-toal-to-I-I-vial-vial-vial-vial-vial-vial-vial-vial-vial-toal-I-toal-vial-Im-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-Ial-Ial-Ial-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I</s>