Quantile regression is a fundamental problem in statistical learning motivated by the need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for this problem over many years of research in econometrics, statistics, and machine learning. Rather than proposing yet another (new) algorithm for quantile regression we adopt a meta viewpoint: we investigate methods for aggregating any number of conditional quantile models, in order to improve accuracy and robustness. We consider weighted ensembles where weights may vary over not only individual models, but also over quantile levels, and feature values. All of the models we consider in this paper can be fit using modern deep learning toolkits, and hence are widely accessible (from an implementation point of view) and scalable. To improve the accuracy of the predicted quantiles (or equivalently, prediction intervals), we develop tools for ensuring that quantiles remain monotonically ordered, and apply conformal calibration methods. These can be used without any modification of the original library of base models. We also review some basic theory surrounding quantile aggregation and related scoring rules, and contribute a few new results to this literature (for example, the fact that post sorting or post isotonic regression can only improve the weighted interval score). Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories.
翻译:量化回归是统计学习中的一个根本问题,因为需要量化预测中的不确定性,或者在不过分递减的情况下对不同的人口进行模型化。例如,流行病学预测、成本估计和收入预测都得益于能够准确量化可能值的范围。因此,多年来在计量经济学、统计和机器学习方面的研究中,已经为这一问题开发了许多模型。我们没有为量化回归提出另一个(新的)算法,而是采用了一种元观点:我们调查了将任何数量有条件的量化模型进行汇总的方法,以便提高准确性和稳健性。我们考虑的是,在加权组合中,加权可能不仅在单个模型上,而且还在四分位值和特性值上有所差异。我们本文中考虑的所有模型都可以使用现代的深层次学习工具包来适应这一问题,因此可以广泛使用(从执行的角度)和可缩放的。为了提高预测的定量(或等量的预测间距)的准确性,我们开发了各种工具,以确保四分位模型保持单数的精确度,以便提高准确性和稳妥性,并且从单个模型和四分位值水平水平上,以及特性值值值值值值值值值值值值值值值值值值值值值值值。我们还使用了两种方法,这样可以改进。