Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. Under certain assumptions, we show consistency of the estimated parameters. Furthermore, we penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data.
翻译:典型的四分位回归方法在以下情况下会失灵: 典型的四分位回归方法是极端的, 只有很少或没有培训数据点超过它。 极值理论的亚性随机森林( ERF) 估计了泛泛Pareto分布的参数, 以预测矢量为条件, 使用线性回归方法、 内核方法或通用添加模型。 多数这些方法都使用线性回归、 内核方法或通用添加模型。 如果预测空间的维度超过几个维度, 或者极端四分位数的回归功能十分复杂, 则这些方法会破裂。 我们提出了一种极端的四分位回归方法, 将随机森林的灵活性与外推理论结合起来。 我们的极性随机森林( ERF) 估计了泛泛分布的参数, 以预测矢量为条件, 以预测矢量森林的重量为条件。 我们在某些假设下, 显示估计参数的一致性。 此外, 我们惩罚这种可能的形状参数, 以稳定其在预测空间中的变异性为标准。 模拟研究表明, 我们的 ERF 将典型的微分位回归方法和现有回归方法与极端数值理论的回归方法都不符合。