We introduce a unified framework for random forest prediction error estimation based on a novel estimator of the conditional prediction error distribution function. Our framework enables simple plug-in estimation of key prediction uncertainty metrics, including conditional mean squared prediction errors, conditional biases, and conditional quantiles, for random forests and many variants. Our approach is especially well-adapted for prediction interval estimation; we show via simulations that our proposed prediction intervals are competitive with, and in some settings outperform, existing methods. To establish theoretical grounding for our framework, we prove pointwise uniform consistency of a more stringent version of our estimator of the conditional prediction error distribution function. The estimators introduced here are implemented in the R package forestError.
翻译:我们引入了基于有条件预测错误分布功能新颖估计的随机森林预测错误估计的统一框架。 我们的框架使得能够对随机森林和多种变体的关键预测不确定性指标,包括有条件平均平方预测错误、有条件偏差和有条件量化指标进行简单的插座估计。 我们的方法特别适合预测间隔估计; 我们通过模拟显示,我们提议的预测间隔与现有方法具有竞争力, 在某些情况下比现有方法更优异。 为了为我们的框架建立理论基础,我们证明,我们对有条件预测错误分布函数的更严格版本的估测标准是划线一致的。 这里引入的估测符是在R包森林错误中实施的。