This work is motivated by the ECMWF CAMS reanalysis data, a valuable resource for researchers in environmental-related areas, as they contain the most updated atmospheric composition information on a global scale. Unlike observational data obtained from monitoring equipment, such reanalysis data are produced by computers via a 4D-Var data assimilation mechanism, thus their stochastic property remains largely unclear. Such lack of knowledge in turn limits their utility scope and hinders them from wider and more flexible statistical usages, especially spatio-temporal modelling except for uncertainty quantification and data fusion. Therefore, this paper studies the stochastic property of these reanalysis outputs data. We used measure theory and proved the tangible existence of spatial and temporal stochasticity associated with these reanalysis data and revealed that they are essentially realisations from digitised versions of real-world hidden spatial and/or temporal stochastic processes. This means we can treat the reanalysis outputs data the same as observational data in practice and thus ensures more flexible spatio-temporal stochastic methodologies apply to them. We also objectively analysed different types of errors in the reanalysis data and deciphered their mutual dependence/independence, which together give clear and definite guidance on the modelling of error terms. The results of this study also serve as a solid stepping stone for spatio-temporal modellers and environmental AI researchers to embark on their research directly with these reanalysis outputs data using stochastic models.
翻译:关于4D-Var再分析输出的随机性
翻译后的摘要:
本文受ECMWF CAMS再分析数据的启发,这是研究环境相关领域的研究人员的宝贵资源,因为它们包含了全球规模上最新的大气组成信息。与通过监测设备获得的观测数据不同,这些再分析数据是通过4D-Var数据同化机制由计算机产生的,因此它们的随机性仍然不太清楚。这种缺乏知识反过来限制了它们的实用范围,并阻碍了它们更广泛和更灵活的统计用途,特别是空间 - 时间建模除了不确定性量化和数据融合。因此,本文研究这些再分析输出数据的随机性质。我们使用测度理论证明了这些再分析数据具有空间和时间随机性的实际存在,并揭示了它们本质上是从现实世界中隐藏的空间和/或时间随机过程的数字化版本的实现。这意味着我们可以在实践中像处理观测数据一样处理再分析输出数据,从而确保更灵活的空间 - 时间随机方法适用于它们。我们还客观地分析了再分析数据中不同类型的错误,并解读了它们的相互依赖/独立性,这些信息共同为误差项的建模提供了明确和明确的指导。本研究的结果也为空间 - 时间模型制定者和环境AI研究人员直接使用这些再分析输出数据进行研究提供了坚实的基础。