Spatially correlated data with an excess of zeros, usually referred to as zero-inflated spatial data, arise in many disciplines. Examples include count data, for instance, abundance (or lack thereof) of animal species and disease counts, as well as semi-continuous data like observed precipitation. Spatial two-part models are a flexible class of models for such data. Fitting two-part models can be computationally expensive for large data due to high-dimensional dependent latent variables, costly matrix operations, and slow mixing Markov chains. We describe a flexible, computationally efficient approach for modeling large zero-inflated spatial data using the projection-based intrinsic conditional autoregression (PICAR) framework. We study our approach, which we call PICAR-Z, through extensive simulation studies and two environmental data sets. Our results suggest that PICAR-Z provides accurate predictions while remaining computationally efficient. An important goal of our work is to allow researchers who are not experts in computation to easily build computationally efficient extensions to zero-inflated spatial models; this also allows for a more thorough exploration of modeling choices in two-part models than was previously possible. We show that PICAR-Z is easy to implement and extend in popular probabilistic programming languages such as nimble and stan.
翻译:零膨胀空间数据通常指存在过多的零的具有空间相关性的数据,在许多领域中都有应用。例如,计数数据,如动物物种丰富度(或缺乏)和疾病计数,以及半连续数据,如降雨观测。空间双部分模型是该类数据的一种灵活的建模方法。由于高维依赖的潜在变量、昂贵的矩阵运算和慢速混合马尔可夫链,拟合双部分模型在大数据方面可能需要耗费大量计算资源。我们描述了一个灵活、计算有效的方法,使用基于投影的本征条件自回归(PICAR)框架对大规模零膨胀空间数据进行建模。通过广泛的模拟研究和两个环境数据集的实验,我们对我们的方法(称为PICAR-Z)进行了研究。我们的研究结果表明,PICAR-Z提供准确的预测,同时保持计算效率。我们的一个重要目标是让不擅长计算的研究人员能够轻松构建计算有效的零膨胀空间模型扩展; 这也允许对双部分模型中的建模选择进行更透彻的探索。我们表明,PICAR-Z易于在流行的概率编程语言,如nimble和stan中实现和扩展。