While sentence anomalies have been applied periodically for testing in NLP, we have yet to establish a picture of the precise status of anomaly information in representations from NLP models. In this paper we aim to fill two primary gaps, focusing on the domain of syntactic anomalies. First, we explore fine-grained differences in anomaly encoding by designing probing tasks that vary the hierarchical level at which anomalies occur in a sentence. Second, we test not only models' ability to detect a given anomaly, but also the generality of the detected anomaly signal, by examining transfer between distinct anomaly types. Results suggest that all models encode some information supporting anomaly detection, but detection performance varies between anomalies, and only representations from more recent transformer models show signs of generalized knowledge of anomalies. Follow-up analyses support the notion that these models pick up on a legitimate, general notion of sentence oddity, while coarser-grained word position information is likely also a contributor to the observed anomaly detection.
翻译:虽然在NLP测试时定期应用了判决异常,但我们尚未在NLP模型中确定异常信息的确切状况。 在本文中,我们的目标是填补两个主要空白,重点是合成异常领域。首先,我们通过设计不同句子异常发生时的等级水平的测试任务,探索异常编码方面的细微差异。第二,我们不仅测试模型检测特定异常现象的能力,而且测试所检测到的异常信号的一般性,通过检查不同异常类型之间的转移。结果显示,所有模型都记录了一些支持异常现象检测的信息,但检测性能在异常之间有所不同,只有较近期变异模型的表示显示异常现象的普遍知识迹象。后续分析支持以下观点,即这些模型接收了一种合法的、一般的句态奇观,而粗略的单词位置信息也可能是观察到异常现象检测的促成因素。