Infinite-order U-statistics (IOUS) has been used extensively on subbagging ensemble learning algorithms such as random forests to quantify its uncertainty. While normality results of IOUS have been studied extensively, its variance estimation approaches and theoretical properties remain mostly unexplored. Existing approaches mainly utilize the leading term dominance property in the Hoeffding decomposition. However, such a view usually leads to biased estimation when the kernel size is large or the sample size is small. On the other hand, while several unbiased estimators exist in the literature, their relationships and theoretical properties, especially the ratio consistency, have never been studied. These limitations lead to unguaranteed performances of constructed confidence intervals. To bridge these gaps in the literature, we propose a new view of the Hoeffding decomposition for variance estimation that leads to an unbiased estimator. Instead of leading term dominance, our view utilizes the dominance of the peak region. Moreover, we establish the connection and equivalence of our estimator with several existing unbiased variance estimators. Theoretically, we are the first to establish the ratio consistency of such a variance estimator, which justifies the coverage rate of confidence intervals constructed from random forests. Numerically, we further propose a local smoothing procedure to improve the estimator's finite sample performance. Extensive simulation studies show that our estimators enjoy lower bias and archive targeted coverage rates.
翻译:无限的U-统计学(IOUS)被广泛用于低沉的混合学习算法,如随机森林,以量化其不确定性。虽然对IOUS的正常性结果进行了广泛研究,但其差异估计方法和理论属性大多尚未探索。现有方法主要利用Hoffding分解中的主要术语优势属性。然而,这种观点通常导致在内核大小大或抽样规模小时有偏颇的估计。另一方面,虽然文献中存在若干不偏袒的估算师,但其关系和理论属性,特别是比率一致性,却从未进行过研究。这些限制导致建立信心间隔期的超常性性能。为弥合文献中的这些差距,我们提出了新的观点,认为差异估计的“Hoffing dcommation ”导致公正的估计。我们的观点不是主要术语的主导,而是利用峰值区域的主导地位。此外,我们建立了我们的估算师与若干现有的公正差异估计师的联系和等同性。从理论上看,我们是第一个从构建范围到确定比例比例的随机性比例,我们首先提出了一种衡量比例的概率比,从而证明我们保持这种差异的准确性比例。