Like many predictive models, random forests provide a point prediction for a new observation. Besides the point prediction, it is important to quantify the uncertainty in the prediction. Prediction intervals provide information about the reliability of the point predictions. We have developed a comprehensive R package, RFpredInterval, that integrates 16 methods to build prediction intervals with random forests and boosted forests. The methods implemented in the package are a new method to build prediction intervals with boosted forests (PIBF) and 15 different variants to produce prediction intervals with random forests proposed by Roy and Larocque (2020). We perform an extensive simulation study and apply real data analyses to compare the performance of the proposed method to ten existing methods to build prediction intervals with random forests. The results show that the proposed method is very competitive and, globally, it outperforms the competing methods.
翻译:与许多预测模型一样,随机森林为新的观测提供了点预测。除了点预测外,还必须量化预测中的不确定性。预测间隔提供关于点预测可靠性的信息。我们开发了一个全面的R包(RFPred Indeval),将16种预测间隔方法与随机森林和增殖森林相结合。在包中实施的方法是一种新方法,用增殖森林(PIBF)和15种不同的变方(Roy和Larocque(202020年)提出的随机森林)建立预测间隔。我们进行了广泛的模拟研究,并应用了真实的数据分析,将拟议方法的性能与10种现有方法的性能进行比较,以随机森林建立预测间隔。结果显示,拟议的方法非常具有竞争力,而且在全球范围超过了相互竞争的方法。