Recurrent neural networks are important tools for sequential data processing. However, they are notorious for problems regarding their training. Challenges include capturing complex relations between consecutive states and stability and efficiency of training. In this paper, we introduce a recurrent neural architecture called Deep Memory Update (DMU). It is based on updating the previous memory state with a deep transformation of the lagged state and the network input. The architecture is able to learn to transform its internal state using any nonlinear function. Its training is stable and fast due to relating its learning rate to the size of the module. Even though DMU is based on standard components, experimental results presented here confirm that it can compete with and often outperform state-of-the-art architectures such as Long Short-Term Memory, Gated Recurrent Units, and Recurrent Highway Networks.
翻译:循环神经网络是序列数据处理中重要的工具。然而,它们在训练方面存在问题,包括捕捉连续状态之间的复杂关系、训练的稳定性和效率。在本文中,我们介绍了一种称为Deep Memory Update(DMU)的循环神经网络架构。它基于使用滞后状态和网络输入的深度变换更新先前的存储状态。该架构能够学习使用任何非线性函数来转换其内部状态。由于将学习率与模块的大小相关联,因此其训练是稳定且快速的。即使DMU基于标准组件,本文呈现的实验结果证实它可以与长短期记忆、门控循环单元和循环高速公路网络等最先进的架构竞争,并且通常表现出更好的性能。