就更难的主题动词协议案例对区域网络进行培训是否仍然在较容易的案例中表现良好? (Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones?)

from arxiv, 15 pages, 3 figures, 13 Tables (including Appendix); Non Archival Extended Abstract Accepted in SciL 2021 - https://scholarworks.umass.edu/scil/vol4/iss1/38/

Previous work suggests that RNNs trained on natural language corpora can capture number agreement well for simple sentences but perform less well when sentences contain agreement attractors: intervening nouns between the verb and the main subject with grammatical number opposite to the latter. This suggests these models may not learn the actual syntax of agreement, but rather infer shallower heuristics such as `agree with the recent noun'. In this work, we investigate RNN models with varying inductive biases trained on selectively chosen `hard' agreement instances, i.e., sentences with at least one agreement attractor. For these the verb number cannot be predicted using a simple linear heuristic, and hence they might help provide the model additional cues for hierarchical syntax. If RNNs can learn the underlying agreement rules when trained on such hard instances, then they should generalize well to other sentences, including simpler ones. However, we observe that several RNN types, including the ONLSTM which has a soft structural inductive bias, surprisingly fail to perform well on sentences without attractors when trained solely on sentences with attractors. We analyze how these selectively trained RNNs compare to the baseline (training on a natural distribution of agreement attractors) along the dimensions of number agreement accuracy, representational similarity, and performance across different syntactic constructions. Our findings suggest that RNNs trained on our hard agreement instances still do not capture the underlying syntax of agreement, but rather tend to overfit the training distribution in a way which leads them to perform poorly on `easy' out-of-distribution instances. Thus, while RNNs are powerful models which can pick up non-trivial dependency patterns, inducing them to do so at the level of syntax rather than surface remains a challenge.

翻译：先前的工作表明, 接受过自然语言 Corpora 培训的 RNN 模式可以在简单的句子中捕捉到数字协议协议, 但当判决包含协议吸引者时效果会较差: 干扰动词和主题之间的名词, 与主题的语法数字相反。这表明这些模式可能不会学习协议的实际语法, 而是推断出浅浅浅的语法, 比如“ 同意最近的名词 ” 。在这项工作中, 我们调查了在选择性选择的“ 硬性” 协议实例( 即至少有一个协议吸引者) 中受过不同感化偏见的 RNNN 模式。由于这些动词无法用简单的线性能符号来预测这些动词, 因此它们可能会帮助为等级语法提供额外的信号。如果 RNNN 培训的语法规则在这种困难的场合中可以学习到基本协议规则, 包括更简单的句子。然而, 我们发现, 包括 ONLSTM 在内的一些 RNN 类型模式, 它具有软性结构上的偏差, 但却没有在不吸引人的语法, 。