灵活的模型聚合用于分位数回归 (Flexible Model Aggregation for Quantile Regression)

Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for this problem over many years of research in statistics, machine learning, and related fields. Rather than proposing yet another (new) algorithm for quantile regression we adopt a meta viewpoint: we investigate methods for aggregating any number of conditional quantile models, in order to improve accuracy and robustness. We consider weighted ensembles where weights may vary over not only individual models, but also over quantile levels, and feature values. All of the models we consider in this paper can be fit using modern deep learning toolkits, and hence are widely accessible (from an implementation point of view) and scalable. To improve the accuracy of the predicted quantiles (or equivalently, prediction intervals), we develop tools for ensuring that quantiles remain monotonically ordered, and apply conformal calibration methods. These can be used without any modification of the original library of base models. We also review some basic theory surrounding quantile aggregation and related scoring rules, and contribute a few new results to this literature (for example, the fact that post sorting or post isotonic regression can only improve the weighted interval score). Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories.

翻译：分位数回归是统计学习中的一项基础问题，其动机是需要准确量化预测的不确定性，或在不过于简化的情况下对多样化种群建模。例如，流行病学预测、成本估算和收入预测等都受益于能够准确量化可能值的范围。因此，在统计学、机器学习和相关领域的多年研究中，针对此问题已经开发出了许多模型。取而代之的是，我们采用元视角：我们研究聚合任意数量的条件分位数模型的方法，以提高准确性和鲁棒性。我们考虑加权集成，其中权重不仅可以在单个模型上变化，而且还可以在分位数水平和特征值上变化。本文中考虑的所有模型都可以使用现代深度学习工具包进行拟合，因此广泛可用（从实现角度）且可扩展性。为了提高预测分位数（或等价地，预测间隔）的准确性，我们开发了工具来确保分位数保持单调有序，并应用符合校准方法。这些方法可以在不修改原始基模型库的情况下使用。我们还回顾了一些关于分位数聚合和相关评分规则的基本理论，并为这个领域做出了一些新的贡献（例如，经过排序或等距回归只能改善加权区间得分）。最后，我们在来自两个不同基准库的34个数据集上提供了广泛的实证比较。

相关内容

统计学

关注 46

统计学(Statistics)是研究收集、分析、解读、展示及组织(collection, analysis, interpretation, presentation and organization)数据的学科，通过量化地研究随机性，从而理解数据的产生机制，并进行判别、预测、优化、决策。统计学理论和方法是很多现代科学分支的支柱，其广泛的应用深刻地影响现代生活，具有代表性的应用领域包括：生物/医学(生物统计学，基因统计学，生物信息学，制药学等)
社会学/环境学(社会统计学，心理学，人口学，空间统计学，环境统计学等)
工业工程学(质量控制，可靠性分析等)
经济学/金融学(精算学，金融统计学等)
工程学/计算机科学(统计学习，数据挖掘，信号/图像采样/处理等)
基础科学(统计物理学，统计化学等)

【ICML2023】通过离散扩散建模实现高效和度引导的图生成

专知会员服务

21+阅读 · 2023年5月17日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日