The performance of large language models (LLMs) has recently improved to the point where the models can generate valid and coherent meta-linguistic analyses of data. This paper illustrates a vast potential for analyses of the meta-linguistic abilities of large language models. LLMs are primarily trained on language data in the form of text; analyzing their meta-linguistic abilities is informative both for our understanding of the general capabilities of LLMs as well as for models of linguistics. In this paper, we propose several types of experiments and prompt designs that allow us to analyze the ability of GPT-4 to generate meta-linguistic analyses. We focus on three linguistics subfields with formalisms that allow for a detailed analysis of GPT-4's theoretical capabilities: theoretical syntax, phonology, and semantics. We identify types of experiments, provide general guidelines, discuss limitations, and offer future directions for this research program.
翻译:近期,大型语言模型(LLMs)的性能已经提高到了一个可以生成有效和连贯的元语言分析数据的程度。本文展示了对大型语言模型进行元语言能力分析的巨大潜力。LLMs主要通过文本形式的语言数据进行训练,分析它们的元语言能力对于我们了解LLMs的一般能力以及语言学模型是有益的。在本文中,我们提出了几种实验类型和提示设计,以允许我们分析GPT-4生成元语言分析的能力。我们的研究重点是三个语言学子领域,这些子领域的形式化能力允许我们详细分析GPT-4的理论能力:理论语法,音韵学和语义学。我们确定实验类型,提供一般指导方针,讨论局限性,并为这一研究提供未来方向。