Syntax is a latent hierarchical structure which underpins the robust and compositional nature of human language. An active line of inquiry is whether large pretrained language models (LLMs) are able to acquire syntax by training on text alone; understanding a model's syntactic capabilities is essential to understanding how it processes and makes use of language. In this paper, we propose a new method, SSUD, which allows for the induction of syntactic structures without supervision from gold-standard parses. Instead, we seek to define formalism-agnostic, model-intrinsic syntactic parses by using a property of syntactic relations: syntactic substitutability. We demonstrate both quantitative and qualitative gains on dependency parsing tasks using SSUD, and induce syntactic structures which we hope provide clarity into LLMs and linguistic representations, alike.
翻译:语法是一种潜伏的等级结构,它支撑着人类语言的稳健和构成性质。一个积极的调查线是,大型预先培训的语言模型(LLMs)能否通过单方文本培训获得语法税;理解模型的合成能力对于理解该模型如何处理和使用语言至关重要。在本文中,我们提出了一种新的方法,即SSUD,它允许在不受金质标准剖面监督的情况下引入合成结构。相反,我们试图通过使用合成关系属性:合成代位性。我们展示了使用SSUD在定量和定性上对依赖性区分任务的进展,并引出我们希望能够向LLMs和语言表达等提供清晰度的合成结构。