Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.
翻译:对机器学习模型的预测和预测应采取概率分布的形式,目的是增加向最终用户传播的信息数量。尽管学术界和工业界对机器学习模型的概率预测和预测的应用越来越频繁,但在整个领域的整体观点下,相关的概念和方法尚未正规化和结构化。在这里,我们审查利用机器学习算法以及评估概率预测的相关指标(一致的评分功能和适当的评分规则)进行预测不确定性估计的专题。审查涵盖一个时期,从采用早期统计(线性回归和时间序列模型,基于贝叶斯统计或四分位回归)到采用最近机器学习算法(包括地点、规模和形状、随机森林、促进和深层学习算法的通用添加模型),这些算法在性质上更为灵活。对实地进展情况的审查加快了我们对如何制定适合用户需要的新算法的理解,因为最新进展是以一些基本概念为基础,应用于更复杂的算法。我们的结论是,对材料进行分类,并讨论正在成为研究热题的挑战。