噪音文本数据:以流行变压器为主的NLP模型的Achilles之高 (Noisy Text Data: Achilles' Heel of popular transformer based NLP models)

In the last few years, the ML community has created a number of new NLP models based on transformer architecture. These models have shown great performance for various NLP tasks on benchmark datasets, often surpassing SOTA results. Buoyed with this success, one often finds industry practitioners actively experimenting with fine-tuning these models to build NLP applications for industry use cases. However, for most datasets that are used by practitioners to build industrial NLP applications, it is hard to guarantee the presence of any noise in the data. While most transformer based NLP models have performed exceedingly well in transferring the learnings from one dataset to another, it remains unclear how these models perform when fine-tuned on noisy text. We address the open question by Kumar et al. (2020) to explore the sensitivity of popular transformer based NLP models to noise in the text data. We continue working with the noise as defined by them -- spelling mistakes & typos (which are the most commonly occurring noise). We show (via experimental results) that these models perform badly on most common NLP tasks namely text classification, textual similarity, NER, question answering, text summarization on benchmark datasets. We further show that as the noise in data increases, the performance degrades. Our findings suggest that one must be vary of the presence of noise in their datasets while fine-tuning popular transformer based NLP models.

翻译：在过去几年里, ML 社区创建了一些基于变压器结构的新的 NLP 模型。这些模型在基准数据集方面的各种 NLP 任务中表现非常出色, 常常超过 SOTA 成果。成功之后, 人们常常发现产业从业人员积极试验这些模型, 微调这些模型, 以建立工业使用案例的 NLP 应用程序。但是, 对于从业人员用来建立工业NLP 应用程序的大多数数据集来说, 很难保证数据中存在任何噪音。虽然大多数基于 NLP 的变压器模型在将学习从一个数据集转移到另一个数据集方面表现得非常好, 但是在微调文本时, 这些模型是如何运行的。我们处理Kumar 等人( 202020年) 的开放问题, 探索基于 NLP 模型的流行变压器模型对文本数据中的噪音的敏感性。我们继续使用它们定义的噪音 -- 拼写错误和斑比( 这是最常见的噪音 ) 。我们通过实验结果显示, 这些模型在最常用的NLP 微调模型中表现得差, 即文本分类, 文本分类、文本相似性、显示我们的数据分析结果必须显示我们的数据必须显示我们的数据在一种基准中显示我们的数据分析结果。