建立长期时间依赖性模型模型的微型批量学习战略:环境应用研究 (Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications)

Shaoming Xu,Ankush Khandelwal,Xiang Li,Xiaowei Jia,Licheng Liu,Jared Willard,Rahul Ghosh,Kelly Cutler,Michael Steinbach,Christopher Duffy,John Nieber,Vipin Kumar

from arxiv, 1. Add experiments results on LSTM and Transformer. 2. Update Time efficiency table (table 4). 3. Share codes and data

In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited performance. Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance. In demonstrating our approach in hydrological modeling, we observe that the most significant gains in predictive accuracy occur when these methods are applied to state variables whose values change more slowly, such as soil water and snowpack, rather than continuously moving flux variables such as streamflow.

翻译：在许多环境应用中,经常使用神经网络(RNN)来模拟物理变量,并长期依赖性。然而,由于小型批次培训,没有考虑批内(内批)培训部门之间以及批(间批)之间(跨批)培训部门之间的时间关系,这可能导致绩效有限。国家型RNNN(RNN)的目的是通过在批之间传递隐藏状态来解决这一问题,因为国家型RNN(RNN)忽略了内部的暂时依赖性,因此在培训稳定性和获取时间依赖性之间存在着一种权衡关系。在本文中,我们对不同的州性RNNN示范战略进行了定量比较,并提出了执行内部和跨批之间时间依赖性培训部门之间的时间关系。首先,我们将州级RNNNNS定义为一个按时间顺序排列的一组培训部门,以便通过在批内传递时间共享时间共享时间信息来解决这个问题。由于国家型RNNNNNNNNNNNNS(国家级的RNNNNN)忽视了内部时间依赖性,因此,培训的稳定性稳定性和时间依赖性依赖性依赖性依赖性差异。为了解决这一问题,我们进一步提出一个新的战略,我们进一步提出一个增加一个战略,以扩大一个培训部门,以显示一个培训部分,在初始值上应用一个培训部分,在初始值中,在初始值中,在初始值中,在初始值中,在初始值中,在初始值值中,在初始值中,在初始值值值中,在初始值中,在初始值中,在初始值中,在初始值值上,在右右前的值上,在前的值中,在学习中可以提供这种培训中可以提供这种培训中,在学习中可以提供这样的培训中可以提供这种学习,在学习,在学习中,在学习中,在学习中可以提供,在学习中提供,在学习中可以提供,在学习中提供,在学习中可以提供,可以提供,可以提供,可以提供这种学习中提供,在学习中可以提供,可以提供,可以提供,可以提供,可以提供,可以提供,可以提供,在学习中提供,可以提供,可以提供,在学习中可以提供。我們中提供,可以提供,可以提供,可以提供,可以提供,可以提供,可以提供,可以提供,