向极端四分位回归的渐变加速 (Gradient boosting for extreme quantile regression)

Extreme quantile regression provides estimates of conditional quantiles outside the range of the data. Classical quantile regression performs poorly in such cases since data in the tail region are too scarce. Extreme value theory is used for extrapolation beyond the range of observed values and estimation of conditional extreme quantiles. Based on the peaks-over-threshold approach, the conditional distribution above a high threshold is approximated by a generalized Pareto distribution with covariate dependent parameters. We propose a gradient boosting procedure to estimate a conditional generalized Pareto distribution by minimizing its deviance. Cross-validation is used for the choice of tuning parameters such as the number of trees and the tree depths. We discuss diagnostic plots such as variable importance and partial dependence plots, which help to interpret the fitted models. In simulation studies we show that our gradient boosting procedure outperforms classical methods from quantile regression and extreme value theory, especially for high-dimensional predictor spaces and complex parameter response surfaces. An application to statistical post-processing of weather forecasts with precipitation data in the Netherlands is proposed.

翻译：极端孔径回归提供了数据范围外的有条件孔径值的估计数。典型孔径回归在这类情况下表现不佳, 因为尾端区域的数据太稀少。极端值理论用于超出观察到值范围的外推和对有条件极端孔径值的估计。根据峰值超临界值方法, 高临界值以上的有条件分布近似于泛角分布, 并带有共变依赖参数。我们提议了一个梯度推动程序, 以通过最大限度地减少其变异性来估计有条件的普遍帕雷托分布。交叉校验用于选择调试参数, 如树木和树深度的数量。我们讨论诊断图, 如可变重要性和部分依赖性图, 这有助于解释合适的模型。在模拟研究中, 我们显示, 我们的梯度加速程序超越了孔径回归和极端值理论的经典方法, 特别是高度预测空间和复杂参数反应表。在荷兰, 提议了利用降水数据进行天气预报的统计后处理。