Gate functions in recurrent models, such as an LSTM and GRU, play a central role in learning various time scales in modeling time series data by using a bounded activation function. However, it is difficult to train gates to capture extremely long time scales due to gradient vanishing of the bounded function for large inputs, which is known as the saturation problem. We closely analyze the relation between saturation of the gate function and efficiency of the training. We prove that the gradient vanishing of the gate function can be mitigated by accelerating the convergence of the saturating function, i.e., making the output of the function converge to 0 or 1 faster. Based on the analysis results, we propose a gate function called fast gate that has a doubly exponential convergence rate with respect to inputs by simple function composition. We empirically show that our method outperforms previous methods in accuracy and computational efficiency on benchmark tasks involving extremely long time scales.
翻译:LSTM 和 GRU 等重复模型中的门功能,在使用捆绑激活功能在模拟时间序列数据模型中学习不同时间尺度方面发挥着核心作用。然而,由于大型输入(称为饱和问题)的捆绑功能渐渐消失,因此很难训练门来捕捉极长的时间尺度。我们仔细分析门功能饱和与训练效率之间的关系。我们证明,可以通过加速饱和功能的趋同来减轻门功能的梯度消失,即使函数的输出趋同到0或1。根据分析结果,我们提议一个称为快门的门功能,用简单函数组成对输入具有双倍指数趋同率。我们的经验显示,我们的方法在精确性和计算涉及极长时间尺度的基准任务的效率方面比以往的方法要好。