具有可区别的、非决定性的堆叠板的学习等级结构 (Learning Hierarchical Structures with Differentiable Nondeterministic Stacks)

Learning hierarchical structures in sequential data -- from simple algorithmic patterns to natural language -- in a reliable, generalizable way remains a challenging problem for neural language models. Past work has shown that recurrent neural networks (RNNs) struggle to generalize on held-out algorithmic or syntactic patterns without supervision or some inductive bias. To remedy this, many papers have explored augmenting RNNs with various differentiable stacks, by analogy with finite automata and pushdown automata (PDAs). In this paper, we improve the performance of our recently proposed Nondeterministic Stack RNN (NS-RNN), which uses a differentiable data structure that simulates a nondeterministic PDA, with two important changes. First, the model now assigns unnormalized positive weights instead of probabilities to stack actions, and we provide an analysis of why this improves training. Second, the model can directly observe the state of the underlying PDA. Our model achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks (within 0.05 nats of the information-theoretic lower bound), including a task on which the NS-RNN previously failed to outperform a deterministic stack RNN baseline. Finally, we propose a restricted version of the NS-RNN that incrementally processes infinitely long sequences, and we present language modeling results on the Penn Treebank.

翻译：在从简单的算法模式到自然语言的顺序数据中学习等级结构 -- -- 从简单的算法模式到自然语言 -- -- 可靠、普遍的方式,仍然是神经语言模型的一个棘手问题。过去的工作表明,反复出现的神经网络(RNNS)在没有监督或某些暗示偏差的情况下,难以对停滞的算法或合成模式进行概括化。为了纠正这一点,许多论文探索了以各种不同的堆叠来增加RNS,与有限的自定义模型和自定义自动数据(PDAs)做类比。在本文中,我们改进了我们最近提出的Nondeministic Stack Stack RNNN(NS-RNNNNN)(NS-R-NNNNNNNNNN)(NS)(NS-ND) (NS-ND) (NNNT) 模拟5种背景的不完全的不完全的数据结构的模型的性能。我们分析了为什么要改进培训。第二,模型可以直接观察PDA的状态。我们的模型在5种无背景语言模型上比以前的所有堆的RNNNNNNNNNNNNN(NS) 模型的模型(O) 和不固定的递增缩的模型,最后的基线,在一种不固定的模型中,在一种不固定的基式的模型上,在一种不固定的基线上,在一种不固定的模型中,在不固定的模型中,在一种不固定的基线上。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/