整合语言理论和神经语言模式 (Integrating Linguistic Theory and Neural Language Models)

Transformer-based language models have recently achieved remarkable results in many natural language tasks. However, performance on leaderboards is generally achieved by leveraging massive amounts of training data, and rarely by encoding explicit linguistic knowledge into neural models. This has led many to question the relevance of linguistics for modern natural language processing. In this dissertation, I present several case studies to illustrate how theoretical linguistics and neural language models are still relevant to each other. First, language models are useful to linguists by providing an objective tool to measure semantic distance, which is difficult to do using traditional methods. On the other hand, linguistic theory contributes to language modelling research by providing frameworks and sources of data to probe our language models for specific aspects of language understanding. This thesis contributes three studies that explore different aspects of the syntax-semantics interface in language models. In the first part of my thesis, I apply language models to the problem of word class flexibility. Using mBERT as a source of semantic distance measurements, I present evidence in favour of analyzing word class flexibility as a directional process. In the second part of my thesis, I propose a method to measure surprisal at intermediate layers of language models. My experiments show that sentences containing morphosyntactic anomalies trigger surprisals earlier in language models than semantic and commonsense anomalies. Finally, in the third part of my thesis, I adapt several psycholinguistic studies to show that language models contain knowledge of argument structure constructions. In summary, my thesis develops new connections between natural language processing, linguistic theory, and psycholinguistics to provide fresh perspectives for the interpretation of language models.

翻译：以变异语言为基础的语言模型最近在许多自然语言任务中取得了显著成果。然而,语言领导板上的绩效一般是通过利用大量培训数据实现的,很少通过将明确的语言知识纳入神经模型来实现的。这导致许多人质疑语言学对现代自然语言处理的关联性。在这个论文中,我介绍了几个案例研究,以说明理论语言学和神经语言模型对彼此的相关性。首先,语言模型对语言学家有用,为测量语义距离提供了客观的工具,而使用传统方法很难做到。另一方面,语言理论通过提供框架和数据来源,以探测语言模型对语言的理解的具体方面。这导致许多人质疑语言对现代自然自然语言处理的关联性。在论文的第一部分,我将语言模型应用语言模型来说明语言等级灵活性问题。使用 mBERT 来提供一种测量语义距离的客观工具,在使用传统方法的第三个模型中,我提供了分析语言等级灵活性的证据,作为方向过程。在模型的第二个部分,语言分析语言模型中,我提出的自然先导变变变变变的模型,我提出了一种方法,用来测量语言的先变变变变变变的顺序结构。