Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks. However, performance metrics such as accuracy do not measure the quality of the model in terms of its ability to robustly represent complex linguistic structure. In this work, we propose a framework and measure of robustness to assess the consistency of linguistic representations against syntax-preserving perturbations. We leverage recent advances in extracting linguistic constructs from LLMs to test the robustness of such structures. Empirically, we study the performance of four LLMs across six different corpora on the proposed robustness measures. We provide evidence that context-free representation (e.g., GloVe) are in some cases competitive with context-dependent representations from modern LLMs (e.g., BERT), yet equally brittle to syntax-preserving manipulations. Emergent syntactic representations in neural networks are brittle, thus our work poses the attention on the risk of comparing such structures to those that are object of a long lasting debate in linguistics.
翻译:据报道,大型语言模型(LLMS)在自然语言处理任务方面表现良好,但是,准确性等性能衡量标准不能衡量该模型的质量,因为其能够强有力地代表复杂的语言结构;在这项工作中,我们提议了一个框架和稳健度衡量标准,以评估语言表达方式在反对保税干扰方面的一致性;我们利用最近从LLMS中提取语言构造的进展来测试这种结构的稳健性;我们经常地研究六个不同的公司中四个LMS在拟议的强健措施方面的表现;我们提供的证据表明,在某些情况下,无背景代表(例如GloVe)与现代LMS(例如BERT)的基于背景的表达方式具有竞争性,但同样不利于保税操纵。神经网络中新出现的合成表达方式是易碎的,因此我们的工作使人们注意到将这种结构与语言上长期辩论的对象相比较的风险。