Modern ocean datasets are large, multi-dimensional, and inherently spatiotemporal. A common oceanographic analysis task is the comparison of such datasets along one or several dimensions of latitude, longitude, depth, time as well as across different data modalities. Here, we show that the Wasserstein distance, also known as earth mover's distance, provides a promising optimal transport metric for quantifying differences in ocean spatiotemporal data. The Wasserstein distance complements commonly used point-wise difference methods such as, e.g., the root mean squared error, by quantifying deviations in terms of apparent displacements (in distance units of space or time) rather than magnitudes of a measured quantity. Using large-scale gridded remote sensing and ocean simulation data of Chlorophyll concentration, a proxy for phytoplankton biomass, in the North Pacific, we show that the Wasserstein distance enables meaningful low-dimensional embeddings of marine seasonal cycles, provides oceanographically relevant summaries of Chlorophyll depth profiles and captures hitherto overlooked trends in the temporal variability of Chlorophyll in a warming climate. We also illustrate how the optimal transport vectors underlying the Wasserstein distance calculation can serve as a novel interpretable visual aid in other exploratory ocean data analysis tasks, e.g., in tracking ocean province boundaries across space and time.
翻译:现代海洋数据集是巨大的、多维的和内在的时空。 共同的海洋学分析任务是按照纬度、经度、深度、时间以及不同数据模式的一个或多个维度、纬度、深度、时间以及不同数据模式对此类数据集进行比较。 我们在这里表明,瓦瑟斯坦距离(又称地球移动者距离)为量化海洋浮游生物量差异提供了一个有希望的最佳运输指标。 瓦瑟斯坦距离补充了常用的点向差异方法,例如根平均正方差,通过量化明显迁移(空间或时间的距离单位)的偏差,而不是测量数量的规模。我们还利用大规模网格遥感和海洋模拟的氯素浓度数据,作为北太平洋浮游植物生物量的代用。 瓦瑟斯坦距离为海洋季节性周期中有意义的低维系嵌入提供了与海洋季节周期相关的海洋相关数据,提供了与海洋素深深度概况,并记录了在变暖气候中叶素的时变趋势方面一直被忽视的趋势。 我们还用大规模遥感遥感和海洋探索性研究模型来解释海洋中的最佳数据分析。