Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete, high-resolution fields with quantified uncertainties. Such inference is challenging due to the high computational cost, the nonstationary behavior of environmental processes on a global scale, and land barriers affecting the dependence of SST. In this work, we develop a multi-resolution approximation (M-RA) of a Gaussian process (GP) whose nonstationary, global covariance function is obtained using local fits. The M-RA requires domain partitioning, which can be set up application-specifically. In the SST case, we partition the domain purposefully to account for and weaken dependence across land barriers. Our M-RA implementation is tailored to distributed-memory computation in high-performance-computing environments. We analyze a MODIS SST dataset consisting of more than 43 million observations, to our knowledge the largest dataset ever analyzed using a probabilistic GP model. We show that our nonstationary model based on local fits provides substantially improved predictive performance relative to a stationary approach.
翻译:每天对地球观测卫星仪器进行大量观测。 例如,全球范围内数千百万次海面温度(SST)观测由中分辨率成像分光仪(MODIS)仪器每天收集。尽管这些数据集大小不全,也很吵闹,但这类数据集是不完整和吵闹的,需要空间统计推论才能获得完整、高分辨率且具有量化不确定性的字段。这种推论具有挑战性,因为计算成本高,全球范围环境过程的非静止行为,以及影响SST依赖性的土地屏障。在这项工作中,我们开发了高山进程(GP)的多分辨率近似(M-RA),其非静止、全球变异功能是利用本地功能获得的。M-RA需要域分隔,从而需要根据具体应用来建立。在SST案中,我们将域区分为核算地面障碍和削弱依赖性。我们的M-RA实施是专门设计用于在高性能设计环境中进行分布式计算。在高性能设计环境中,我们分析一个多分辨率和多分辨率的多分辨率近似近似(M-RA),我们用一个以4300万次的SST稳定性观测模型来分析我们最新的当地数据。