LSTMs trained on next-word prediction can accurately perform linguistic tasks that require tracking long-distance syntactic dependencies. Notably, model accuracy approaches human performance on number agreement tasks (Gulordava et al., 2018). However, we do not have a mechanistic understanding of how LSTMs perform such linguistic tasks. Do LSTMs learn abstract grammatical rules, or do they rely on simple heuristics? Here, we test gender agreement in French which requires tracking both hierarchical syntactic structures and the inherent gender of lexical units. Our model is able to reliably predict long-distance gender agreement in two subject-predicate contexts: noun-adjective and noun-passive-verb agreement. The model showed more inaccuracies on plural noun phrases with gender attractors compared to singular cases, suggesting a reliance on clues from gendered articles for agreement. Overall, our study highlights key ways in which LSTMs deviate from human behaviour and questions whether LSTMs genuinely learn abstract syntactic rules and categories. We propose using gender agreement as a useful probe to investigate the underlying mechanisms, internal representations, and linguistic capabilities of LSTM language models.
翻译:在下方预测方面受过培训的LSTML能够准确地完成需要跟踪长距离合成依赖性的语言任务。值得注意的是,模型准确性是人类在数字协议任务(Gulordava等人,2018年)上的表现模式。然而,我们对于LSTMMS如何执行这种语言任务没有机械化的理解。LSTMS学习抽象的语法规则,或者他们是否依赖简单的休眠法?在这里,我们用法语测试性别协议,其中要求跟踪等级合成结构以及词汇单位的固有性别。我们的模型能够可靠地预测在两个主题前背景中的长期性别协议:非激励性和非被动性语言协议。模型显示,与单一案例相比,对性别吸引者的多元名词句比起来更加不准确,表明依赖性别条款的线索来达成一致。总体而言,我们的研究强调了LSTMS偏离人类行为的关键方式,以及LSTMS是否真正学习抽象的合成规则和类别。我们提议使用性别协议作为调查基础机制、内部表现和语言能力的有用探测器。