Extreme quantile regression provides estimates of conditional quantiles outside the range of the data. Classical methods such as quantile random forests perform poorly in such cases since data in the tail region are too scarce. Extreme value theory motivates to approximate the conditional distribution above a high threshold by a generalized Pareto distribution with covariate dependent parameters. This model allows for extrapolation beyond the range of observed values and estimation of conditional extreme quantiles. We propose a gradient boosting procedure to estimate a conditional generalized Pareto distribution by minimizing its deviance. Cross-validation is used for the choice of tuning parameters such as the number of trees and the tree depths. We discuss diagnostic plots such as variable importance and partial dependence plots, which help to interpret the fitted models. In simulation studies we show that our gradient boosting procedure outperforms classical methods from quantile regression and extreme value theory, especially for high-dimensional predictor spaces and complex parameter response surfaces. An application to statistical post-processing of weather forecasts with precipitation data in the Netherlands is proposed.
翻译:极端孔径回归提供了数据范围外的有条件孔径的估计数。 典型的方法, 如四分位随机森林在这类情况下表现不佳, 因为尾端区域的数据太稀少。 极端价值理论促使通过泛泛Paresto分布, 以共变依赖参数, 接近高于高阈值的有条件分布。 这个模型允许在观察到的数值范围以外进行外推和估计有条件极端孔径的极端孔径值。 我们提议了一个梯度推动程序, 以通过尽量减少其偏差来估计有条件的普遍帕雷托分布。 交叉校验用于选择调试参数, 如树木和树深度的数量。 我们讨论诊断图, 如可变重要性和部分依赖图, 帮助解释合适的模型。 在模拟研究中, 我们显示, 我们的梯度提法比典型方法超越了孔径回归和极端数值理论的范围, 特别是对于高度预测空间和复杂的参数响应表面。 提议对荷兰降水数据的天气后处理进行统计应用。