Neural language models trained with a predictive or masked objective have proven successful at capturing short and long distance syntactic dependencies. Here, we focus on verb argument structure in German, which has the interesting property that verb arguments may appear in a relatively free order in subordinate clauses. Therefore, checking that the verb argument structure is correct cannot be done in a strictly sequential fashion, but rather requires to keep track of the arguments' cases irrespective of their orders. We introduce a new probing methodology based on minimal variation sets and show that both Transformers and LSTM achieve a score substantially better than chance on this test. As humans, they also show graded judgments preferring canonical word orders and plausible case assignments. However, we also found unexpected discrepancies in the strength of these effects, the LSTMs having difficulties rejecting ungrammatical sentences containing frequent argument structure types (double nominatives), and the Transformers tending to overgeneralize, accepting some infrequent word orders or implausible sentences that humans barely accept.
翻译:受过预测性或蒙面性目标培训的神经语言模型在捕捉短距离和长距离合成依赖性方面证明是成功的。 在这里, 我们侧重于德语动词参数结构, 其有趣的属性是动词参数在从属条款中可以以相对自由的顺序出现。 因此, 检查动词参数结构是否正确, 不能严格按顺序进行, 而是要求跟踪争论案例, 而不管其命令如何。 我们引入了基于最小变异组合的新的测试方法, 并显示变换器和LSTM在这项测试中得分大大高于概率。 作为人类, 它们也显示分级判断倾向于卡通字命令和可信的案件分配。 然而, 我们还发现这些效果的强度有出乎意料的差异, LSTMs 难以拒绝含有频繁的参数结构类型( 倍记号) 的非语句, 变换器倾向于过于笼统化, 接受一些不常见的单词或不可信的判决, 而人类几乎不能接受。