Training recurrent neural networks is known to be difficult when time dependencies become long. Consequently, training standard gated cells such as gated recurrent units and long-short term memory on benchmarks where long-term memory is required remains an arduous task. In this work, we propose a general way to initialize any recurrent network connectivity through a process called "warm-up" to improve its capability to learn arbitrarily long time dependencies. This initialization process is designed to maximize network reachable multi-stability, i.e. the number of attractors within the network that can be reached through relevant input trajectories. Warming-up is performed before training, using stochastic gradient descent on a specifically designed loss. We show that warming-up greatly improves recurrent neural network performance on long-term memory benchmarks for multiple recurrent cell types, but can sometimes impede precision. We therefore introduce a parallel recurrent network structure with partial warm-up that is shown to greatly improve learning on long time-series while maintaining high levels of precision. This approach provides a general framework for improving learning abilities of any recurrent cell type when long-term memory is required.
翻译:已知,当时间依赖性变长时,培训经常性神经网络就难以进行经常性培训。因此,培训标准门式单元,如门式经常单元和长期短距记忆,在需要长期记忆的基准上,仍然是一项艰巨的任务。在这项工作中,我们提出一种一般办法,通过一个称为“暖化”的过程来启动任何经常性网络连接,以提高其学习任意长期依赖性的能力。这个初始化过程旨在尽可能扩大网络可达到的多稳定度,即通过相关输入轨迹可以达到的吸引者的数量。在培训前,使用随机梯度梯度脱落,在特定的损失上进行警告。我们表明,变暖能极大地提高多个经常性细胞类型长期记忆基准上的经常性神经网络性能,但有时会妨碍精确性。因此,我们引入了一个同时同时提供部分热度的经常性网络结构,在保持高度精确性的同时,可以大大改进长期序列上的学习。这一方法提供了一个在需要长期记忆时提高任何经常性细胞类型的学习能力的一般框架。