组成,注意,还是两者兼而有之? (Composition, Attention, or Both?)

In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induce human-like syntactic generalization. Specifically, we train language models (LMs) with and without these two components with the model sizes carefully controlled, and evaluate their syntactic generalization performance against six test circuits on the SyntaxGym benchmark. The results demonstrated that the composition function and the self-attention mechanism both play an important role to make LMs more human-like, and closer inspection of linguistic phenomenon implied that the composition function allowed syntactic features, but not semantic features, to percolate into subtree representations.

翻译：在本文中,我们提出一个名为“组成注意语法”的新结构,将亚树重新组成成一个具有组成功能的单一矢量代表,有选择地以自我注意机制关注先前的结构信息。我们调查这些组成部分 -- -- 组成功能和自我注意机制 -- -- 是否既能诱发类似人的同义法的概括化。具体地说,我们用和没有这两个组成部分的模型来训练语言模型(LMs),并仔细控制这两个模型的大小,对照语权基准上的六个测试电路来评价其综合概括性表现。结果显示,组成功能和自我注意机制都发挥了重要作用,使LMs更像人一样,更密切地检查语言现象意味着,组成功能允许合成特征,但非语权特征,进入子树木的表述。