Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for probability density estimation which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology allows us to estimate information-theoretic measures to characterize multivariate densities: information, entropy, total correlation, and mutual information. We demonstrate how information theory measures can be applied in various Earth system data analysis problems. First we show how the method can be used to jointly Gaussianize radar backscattering intensities, synthesize hyperspectral data, and quantify of information content in aerial optical images. We also quantify the information content of several variables describing the soil-vegetation status in agro-ecosystems, and investigate the temporal scales that maximize their shared information under extreme events such as droughts. Finally, we measure the relative information content of space and time dimensions in remote sensing products and model simulations involving long records of key variables such as precipitation, sensible heat and evaporation. Results confirm the validity of the method, for which we anticipate a wide use and adoption. Code and demos of the implemented algorithms and information-theory measures are provided.
翻译:信息理论是分析地球系统数据的一个极好的框架,因为它允许我们定性不确定性和冗余性,并且可以普遍解释。但是,准确估计信息内容具有挑战性,因为时空数据是高维的、多元的和非线性特性。在本文中,我们应用多变量化的概率密度估计方法,这种方法对维度是强大的,具有统计保障,并且易于应用。此外,这一方法使我们能够估计信息理论性测量方法,以辨别多变密度:信息、变幻剂、全面相关和相互信息。我们展示了信息理论性测量方法如何应用于各种地球系统数据分析问题。首先,我们展示了如何使用这种方法将雷达反向反射密度、合成超光谱数据以及空气光学图像中信息内容的量化。我们还量化了描述农业生态系统土壤-植被状况的若干变量的信息内容,并调查在干旱等极端事件下最大限度地共享信息的时间尺度。最后,我们测量了空间的相对信息内容和数据量度,并测量了遥感产品中应用的可靠度度度值,我们测量了遥感数据模型和时间模型中所使用的度模型,从而确认了了对空间和时间值数据进行了模拟。