This paper investigates pooling strategies for tail index and extreme quantile estimation from heavy-tailed data. To fully exploit the information contained in several samples, we present general weighted pooled Hill estimators of the tail index and weighted pooled Weissman estimators of extreme quantiles calculated through a nonstandard geometric averaging scheme. We develop their large-sample asymptotic theory across a fixed number of samples, covering the general framework of heterogeneous sample sizes with different and asymptotically dependent distributions. Our results include optimal choices of pooling weights based on asymptotic variance and MSE minimization. In the important application of distributed inference, we prove that the variance-optimal distributed estimators are asymptotically equivalent to the benchmark Hill and Weissman estimators based on the unfeasible combination of subsamples, while the AMSE-optimal distributed estimators enjoy a smaller AMSE than the benchmarks in the case of large bias. We consider additional scenarios where the number of subsamples grows with the total sample size and effective subsample sizes can be low. We extend our methodology to handle serial dependence and the presence of covariates. Simulations confirm that our pooled estimators perform virtually as well as the benchmark estimators. Two applications to real weather and insurance data are showcased.
翻译:本文调查了从繁琐的数据中收集尾矿指数和极端孔径估计的战略。 为了充分利用若干样本中的信息, 我们展示了尾矿指数和通过非标准平均几何方法计算出的极端孔径数的加权集合山丘估计器和加权集合韦斯曼估计器。 我们开发了它们基于固定样本数量不可行的组合的大规模抽样无症状估计理论, 覆盖了不同和无症状依赖分布的多样样本大小的总体框架。 我们的结果包括了基于无症状差异和MSE最小化的集合权重的最佳选择。 在分布式推断的重要应用中, 我们证明差异- 最佳分布的极端孔径估计器与基准山和 Weissman 估计器基本相同。 AMSE 分布最优的分布估计器比大偏差情况下的基准范围要小。 我们考虑的其他假设是, 子取样器的数量会随着总样本规模的大小和真实的海流数据运行而增长, 我们的基底基底基底基底基底基底基底基度数据运行。