Modelling block maxima using the generalised extreme value (GEV) distribution is a classical and widely used method for studying univariate extremes. It allows for theoretically motivated estimation of return levels, including extrapolation beyond the range of observed data. A frequently overlooked challenge in applying this methodology comes from handling datasets containing missing values. In this case, one cannot be sure whether the true maximum has been recorded in each block, and simply ignoring the issue can lead to biased parameter estimators and, crucially, underestimated return levels. We propose an extension of the standard block maxima approach to overcome such missing data issues. This is achieved by explicitly accounting for the proportion of missing values in each block within the GEV model. Inference is carried out using likelihood-based techniques, and we propose an update to commonly used diagnostic plots to assess model fit. We assess the performance of our method via a simulation study, with results that are competitive with the "ideal" case of having no missing values. The practical use of our methodology is demonstrated on sea surge data from Brest, France, and air pollution data from Plymouth, U.K.
翻译:使用广义极值分布对块最大值进行建模是研究单变量极值的经典且广泛应用的方法。该方法允许基于理论估计重现水平,包括对观测数据范围之外的外推。在应用此方法时,一个常被忽视的挑战来自处理包含缺失值的数据集。在这种情况下,无法确定每个块中是否记录了真实的最大值,而简单地忽略此问题可能导致参数估计量产生偏差,更重要的是,会低估重现水平。我们提出了一种标准块最大值方法的扩展,以克服此类缺失数据问题。这是通过在广义极值模型中明确考虑每个块内缺失值的比例来实现的。推断采用基于似然的技术进行,并提出了对常用诊断图的更新以评估模型拟合度。我们通过模拟研究评估了该方法的性能,其结果与无缺失值的“理想”情况具有竞争力。我们方法在实际中的应用通过法国布雷斯特的海浪涌高数据和英国普利茅斯的空气污染数据进行了展示。