There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words. While previous unsupervised parsing methods mostly focus on only inducing one class of grammars, we introduce a novel model, StructFormer, that can simultaneously induce dependency and constituency structure. To achieve this, we propose a new parsing framework that can jointly generate a constituency tree and dependency graph. Then we integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism. Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling at the same time.
翻译:自然语言语法分为两大类 -- -- 依赖语法,它模拟单词和选区语法之间的一对一对一对应,它模拟一个或几个对应词的组合。虽然先前未受监督的分解方法主要侧重于只引出一类语法,但我们引入了一个新颖的模式,即StructFormer,它可以同时诱导依赖性和选区结构。为了实现这一点,我们提出了一个新的分解框架,可以联合生成一个选区树和依赖性图。然后,我们以不同的方式,通过一种新的依赖性约束自留机制,将诱发的依附关系纳入变压器。实验结果显示,我们的模式可以在不受监督的选区分解、不受监督的依附性分解以及同时进行蒙面语言建模方面取得巨大成果。