This paper details a methodology proposed for the EVA 2021 conference data challenge. The aim of this challenge was to predict the number and size of wildfires over the contiguous US between 1993 and 2015, with more importance placed on extreme events. In the data set provided, over 14\% of both wildfire count and burnt area observations are missing; the objective of the data challenge was to estimate a range of marginal probabilities from the distribution functions of these missing observations. To enable this prediction, we make the assumption that the marginal distribution of a missing observation can be informed using non-missing data from neighbouring locations. In our method, we select spatial neighbourhoods for each missing observation and fit marginal models to non-missing observations in these regions. For the wildfire counts, we assume the compiled data sets follow a zero-inflated negative binomial distribution, while for burnt area values, we model the bulk and tail of each compiled data set using non-parametric and parametric techniques, respectively. Cross validation is used to select tuning parameters, and the resulting predictions are shown to significantly outperform the benchmark method proposed in the challenge outline. We conclude with a discussion of our modelling framework, and evaluate ways in which it could be extended.
翻译:本文详细介绍了EVA 2021年会议数据挑战的拟议方法。这一挑战的目的是预测1993年至2015年期间美国毗连地区野火的数量和规模,更加重视极端事件。在所提供的数据集中,超过14<unk> 的野火计数和烧焦地区的观测都缺失;数据挑战的目标是估计与这些缺失观测的分布功能有关的一系列边际概率。为了能够进行这一预测,我们假定,利用邻近地点未漏出的数据,可以通报缺失观测的边际分布。我们的方法是,为每个缺失的观测选择空间区,使边缘模型适合这些地区不漏的观测。对于野火计数,我们假设汇编的数据集遵循零膨胀的负双向分布,而对于烧焦地区值,我们分别用非参数和参数技术对每个汇编数据集的大小和尾部进行模拟。我们用交叉验证来选择调准参数,由此得出的预测显示,大大超出挑战大纲中提议的基准方法。我们的结论是,我们通过讨论模拟框架并评估各种方法,可以扩展框架。</s>