Machine learning models can reach high performance on benchmark natural language processing (NLP) datasets but fail in more challenging settings. We study this issue when a pre-trained model learns dataset artifacts in natural language inference (NLI), the topic of studying the logical relationship between a pair of text sequences. We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference (SNLI) corpus. We study the stylistic pattern of dataset artifacts in the SNLI. To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks: a behavioral testing checklist at the sentence level and lexical synonym criteria at the word level. Specifically, our combination method enhances our model's resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline.
翻译:机器学习模型在基准自然语言处理数据集上可能表现出色,但在更具挑战性的设置下会失败。我们研究了一个预训练模型在自然语言推理 (NLI) 中学习数据集伪迹的问题,NLI 研究的是文本序列之间的逻辑关系。我们提供了各种技术来分析和定位斯坦福自然语言推理(SNLI)语料库中的数据集伪迹。我们研究了 SNLI 中数据集伪迹的风格模式。为了减少数据集伪迹,我们采用了独特的多尺度数据增强技术,其中包含两个不同的框架:句子级别的行为测试清单和单词级别的词汇同义标准。具体而言,我们的组合方法增强了我们的模型对扰动测试的抵抗力,使其始终优于预训练基线模型。