Learning hierarchical structures in sequential data -- from simple algorithmic patterns to natural language -- in a reliable, generalizable way remains a challenging problem for neural language models. Past work has shown that recurrent neural networks (RNNs) struggle to generalize on held-out algorithmic or syntactic patterns without supervision or some inductive bias. To remedy this, many papers have explored augmenting RNNs with various differentiable stacks, by analogy with finite automata and pushdown automata. In this paper, we present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN) that achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks (within 0.05 nats of the information-theoretic lower bound), including a task in which the NS-RNN previously failed to outperform a deterministic stack RNN baseline. Our model assigns arbitrary positive weights instead of probabilities to stack actions, and we provide an analysis of why this improves training. We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language and present results on the Penn Treebank corpus.
翻译:在从简单的算法模式到自然语言的顺序数据中学习等级结构 -- -- 从简单的算法模式到自然语言 -- -- 可靠、普遍的方式,仍然是神经语言模型的一个棘手问题。过去的工作表明,反复出现的神经网络(RNNS)在没有监督或某些感知偏差的情况下,努力对停滞的算法或合成模式进行普及。为了纠正这一点,许多论文探索了以各种不同堆叠来增强RNNS, 与有限的自定义堆积和推倒自自定义的自定义堆积基线作类比。在本文件中,我们根据最近提出的不确定性 Stack Stack RNN(NS-RNNN) (NS-RNNN) 模型,我们提出了一个堆积RNNNNNS 模型,该模型比所有先前的堆积式RNNNNNNNNNN(R) 在五个无上不上背景的语言模型的模型模型模型上提出了限制版本,该模型用于自然数据库的实用语言。