While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We analyze the use of lexical overlap heuristics in natural language inference, paraphrase detection, and reading comprehension (using a novel contrastive dataset), and find that larger models are much less susceptible to adopting lexical overlap heuristics. We also find that longer training leads models to abandon lexical overlap heuristics. Finally, we provide evidence that the disparity between models size has its source in the pre-trained model
翻译:虽然经过微调的语言模型在许多任务上运作良好,但也显示它们依赖表面表面特征,如词汇重叠等。过度使用这类超自然学可能会导致具有挑战性的投入失败。我们分析了自然语言推论、参数探测和阅读理解(使用新颖的对比数据集)中使用的词汇重叠超自然学重叠超自然学,发现较大的模型不太容易采用词汇重叠超自然学。我们还发现,长期培训导致模型放弃理论重叠超自然学。最后,我们提供的证据是,模型大小之间的差异来源于预先培训的模型。