Modern statistical analyses often encounter datasets with massive sizes and heavy-tailed distributions. For datasets with massive sizes, traditional estimation methods can hardly be used to estimate the extreme value index directly. To address the issue, we propose here a subsampling-based method. Specifically, multiple subsamples are drawn from the whole dataset by using the technique of simple random subsampling with replacement. Based on each subsample, an approximate maximum likelihood estimator can be computed. The resulting estimators are then averaged to form a more accurate one. Under appropriate regularity conditions, we show theoretically that the proposed estimator is consistent and asymptotically normal. With the help of the estimated extreme value index, we can estimate high-level quantiles and tail probabilities of a heavy-tailed random variable consistently. Extensive simulation experiments are provided to demonstrate the promising performance of our method. A real data analysis is also presented for illustration purpose.
翻译:现代统计分析往往会遇到大尺寸和繁琐分布的数据集。 对于大尺寸的数据集,很难使用传统的估计方法直接估计极端价值指数。为了解决这个问题,我们在此建议一个基于子抽样的方法。具体地说,利用简单的随机子抽样技术,从整个数据集中抽取多个子样本,用替换的方法进行简单的随机子抽样。根据每个子抽样,可以计算出大致最高的可能性估计数字。然后,得出的估计数字平均为更准确的。在适当的常规条件下,我们从理论上表明,提议的估计数字是一致的,不那么正常的。在估计极端价值指数的帮助下,我们可以持续地估计一个重尾部随机变量的高位和尾部概率。提供了广泛的模拟实验,以显示我们方法的有前途的性能。为了说明目的,还提出了真实的数据分析。