Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models' behaviour in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.
翻译:构建语法( CxG) 是来自认知语言的范例, 强调了语法和语义之间的联系。 它不是在词汇项目上运作的规则,而是将构建作为语言的核心构件, 即不同颗粒的语言单位, 将语法和语义结合起来。 作为评估CxG与由最先进的预先培训语言模型(PLMs)所显示的合成和语义知识兼容性的第一步, 我们展示了他们分类和理解最常用的构造之一( 英国比较相关( CC) ) 的能力调查。 我们进行实验, 检查合成探测器的分类准确性, 以及模型在语义应用任务中的行为。 与BERT、 RoBERTA 和 DeBERTA作为示例。 我们的结果表明, 所有三个经过调查的PLMs都能够识别CC的结构, 但是没有使用它的含义。 虽然许多NLP任务的P任务中像人一样的 PLMs表现被指称, 但PLMs在语言学领域的中心领域仍然存在重大缺陷。