There are two major classes of natural language grammars -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words. While previous unsupervised parsing methods mostly focus on only inducing one class of grammars, we introduce a novel model, StructFormer, that can induce dependency and constituency structure at the same time. To achieve this, we propose a new parsing framework that can jointly generate a constituency tree and dependency graph. Then we integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism. Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling at the same time.
翻译:自然语言语法分为两大类 -- -- 依赖语法,它模拟单词和选区语法之间的一对一对应,它模拟一个或数个对应词的组合。虽然以前未经监督的分解方法主要侧重于只引出一类语法,但我们引入了一个新颖的模式,即StructFormer, 它可以同时诱发依赖性和选区结构。为了实现这一点,我们提议一个新的分解框架, 它可以联合生成一个选区树和依赖性图。 然后我们以不同的方式将诱发的依赖性关系纳入变压器, 通过一种新的受依赖性制约的自我注意机制。实验结果显示,我们的模式可以在不受监督的选区分解、不受监督的依附性分割以及同时进行蒙面语言建模方面取得巨大成果。