Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling. Existing architectures that address the issue are often complex and costly to train. The Differential State Framework (DSF) is a simple and high-performing design that unifies previously introduced gated neural models. DSF models maintain longer-term memory by learning to interpolate between a fast-changing data-driven representation and a slowly changing, implicitly stable state. This requires hardly any more parameters than a classical, simple recurrent network. Within the DSF framework, a new architecture is presented, the Delta-RNN. In language modeling at the word and character levels, the Delta-RNN outperforms popular complex architectures, such as the Long Short Term Memory (LSTM) and the Gated Recurrent Unit (GRU), and, when regularized, performs comparably to several state-of-the-art baselines. At the subword level, the Delta-RNN's performance is comparable to that of complex gated architectures.
翻译:长时期滞后的学习有用信息对于语言模型等任务的时间神经模型来说是一个关键和困难的问题。解决该问题的现有结构往往复杂,培训费用昂贵。差别国家框架(DSF)是一个简单和高性能的设计,它统一了以前引入的封闭型神经模型。DSF模型保持长期记忆,学会在快速变化的数据驱动的表达方式和缓慢变化的、隐含稳定状态之间进行内插。这比传统的、简单的经常性网络几乎不需要更多的参数。在DSF框架内,提出了一个新的结构,即Delta-RNNN。在用文字和字符进行建模时,Delta-RNNN超越了流行的复杂结构,如长期内存(LSTM)和Gated经常单元(GRU),在正规化时,与几个最先进的基线相对应。在子词层面,Delta-RNN的性能与复杂的门结构相似。