Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a four-stage high-dimensional bivariate sparse spatial model for zero-inflated data, which is developed using stochastic partial differential equations(SPDE). In Stage 1, the observations are categorized in zero/nonzero categories and are modeled using a two-layered hierarchical Bayesian sparse spatial model to estimate the probabilities of these two categories. In Stage 2, before modeling the positive observations using spatially-varying coefficients, smoothed parameter surfaces are obtained from empirical estimates using fixed rank kriging. This approximate Bayesian method inference was employed to avoid the high computational burden of large spatial data modeling using spatially-varying coefficients. In Stage 3, the standardized log-transformed positive observations from the second stage are further modeled using a sparse bivariate spatial Gaussian process. The Gaussian distribution assumption for wildfire counts developed in the third stage is computationally effective but erroneous. Thus in Stage 4, the predicted values are rectified using Random Forests. The posterior inference is drawn for Stages 1 and 3 using Markov chain Monte Carlo (MCMC) sampling. A cross-validation scheme is then created for the artificially generated gaps, and the EVA 2021 prediction scores of the proposed model are compared to those obtained using certain natural competitors.
翻译:根据2021年极端价值分析(EVA 2021)的数据挑战,我们提出了一个基于统计和机器学习的方法,用于对极端野火频率和大小进行空间预测。这个方法是专门为处理大型数据集而设计的,包括缺失的观测。我们的方法依赖于一个四级高维双变空空间模型,用于零膨胀数据,该模型是使用空间变异系数(SPDE)开发的。在第一阶段,观察分为零/非零类别,并使用一个两级的贝叶西亚低空空间模型进行模拟,以估计这两个类别的概率。在第二阶段,在利用空间变异系数(空间变异系数)进行正面观测之前,先用空间变系数(空间变异系数)进行模拟。光滑动参数表表来自使用固定的实证估计。这种近似贝叶叶氏方法的推论用来避免大型空间数据模型的计算负担过重。在第二阶段,标准化的逻辑变异差观测结果在第二个阶段,在第二个阶段使用稀薄的三变形空间测算过程进行进一步模拟。在第一阶段,利用正序的顺序分析中,对正序分布进行。