水文领域时间序列深水学习模型的数据协同效应 (The data synergy effects of time-series deep learning models in hydrology)

When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to regionalize - to divide a large spatial domain into multiple regions and study each region separately - instead of fitting a single model on the entire data (also known as unification). Traditional wisdom in these fields suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, by partitioning the training data, each model has access to fewer data points and cannot learn from commonalities between regions. Here, through two hydrologic examples (soil moisture and streamflow), we argue that unification can often significantly outperform regionalization in the era of big data and deep learning (DL). Common DL architectures, even without bespoke customization, can automatically build models that benefit from regional commonality while accurately learning region-specific differences. We highlight an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. In fact, the performance of the DL models benefited from more diverse rather than more homogeneous training data. We hypothesize that DL models automatically adjust their internal representations to identify commonalities while also providing sufficient discriminatory information to the model. The results here advocate for pooling together larger datasets, and suggest the academic community should place greater emphasis on data sharing and compilation.

翻译：当将统计模型与水文等地球科学学科的变量相适应时,区域化是一种习惯做法,即将大的空间领域分为多个区域,并分别研究每个区域,而不是对整个数据(又称统一)设置单一模型。这些领域的传统智慧表明,由于每个区域具有同质性,为每个区域分别建造的模型的性能会更高。然而,通过对培训数据进行分割,每个模型都能够获得较少的数据点,无法从不同区域之间的共同点中学习。事实上,通过两个水文实例(土壤湿度和流流),我们认为,在大数据和深层次学习的时代,统一往往大大超过区域化(DL)。共同的DL结构,即使不进行简单的定制,也可以自动建立从区域共同性中获益的模式,同时准确地了解区域差异。我们强调一种效果,即数据协同性,当数据从不同区域的数据汇集在一起时,每个DL模型的结果会得到改善。事实上,DL模型的业绩会从更多样化而不是更一致的培训数据数据数据中得益。我们假设,DL模型可以自动调整其内部表现方式,即使不作简单的定制化,也可以建立共同性,同时提供更深层次的数据。我们应该提出更多的共同性数据。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/