One of the fundamental principles of contemporary linguistics states that language processing requires the ability to extract recursively nested tree structures. However, it remains unclear whether and how this code could be implemented in neural circuits. Recent advances in Recurrent Neural Networks (RNNs), which achieve near-human performance in some language tasks, provide a compelling model to address such questions. Here, we present a new framework to study recursive processing in RNNs, using subject-verb agreement as a probe into the representations of the neural network. We trained six distinct types of RNNs on a simplified probabilistic context-free grammar designed to independently manipulate the length of a sentence and the depth of its syntactic tree. All RNNs generalized to subject-verb dependencies longer than those seen during training. However, none systematically generalized to deeper tree structures, even those with a structural bias towards learning nested tree (i.e., stack-RNNs). In addition, our analyses revealed primacy and recency effects in the generalization patterns of LSTM-based models, showing that these models tend to perform well on the outer- and innermost parts of a center-embedded tree structure, but poorly on its middle levels. Finally, probing the internal states of the model during the processing of sentences with nested tree structures, we found a complex encoding of grammatical agreement information (e.g. grammatical number), in which all the information for multiple words nouns was carried by a single unit. Taken together, these results indicate how neural networks may extract bounded nested tree structures, without learning a systematic recursive rule.
翻译:当代语言的根本原则之一指出,语言处理要求有能力提取循环嵌巢树结构。然而,我们仍不清楚该代码能否以及如何在神经电路中实施。在常规神经网络(RNN)中最近的进展,在某种语言任务中实现接近人的性能,为解决这些问题提供了令人信服的模式。在这里,我们提出了一个新框架,用于研究RNN的循环处理,使用主题动词协议作为神经网络的演示。我们用简化的无环境概率语法培训了六种不同类型的RNN,旨在独立管理一个句子的长度和其合成树的深度。所有神经网络(RNNN)的近期进展,在有些语言任务中达到接近人的性能性能,但比培训期间看到的时间长。然而,没有一个系统化的通用到更深层的树结构,甚至那些对学习嵌巢树(即堆-RNNPs)结构的结构性偏向性偏向。此外,我们的分析揭示了基于LSTM模型的通用模式模式的通用模式和真实性效果,表明这些模型在内部结构的中间和最深层结构中,这些结构的顺序结构中,我们发现了这些结构的顺序结构的顺序结构的顺序结构,最后发现这些结构在中间和最深层层次结构的顺序结构的层次结构中,这些结构的层次结构中,这些结构的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次的层次结构的层次的层次的层次的层次可能发现。