Many recent studies have shown that for models trained on datasets for natural language inference (NLI), it is possible to make correct predictions by merely looking at the hypothesis while completely ignoring the premise. In this work, we manage to derive adversarial examples in terms of the hypothesis-only bias and explore eligible ways to mitigate such bias. Specifically, we extract various phrases from the hypotheses (artificial patterns) in the training sets, and show that they have been strong indicators to the specific labels. We then figure out `hard' and `easy' instances from the original test sets whose labels are opposite to or consistent with those indications. We also set up baselines including both pretrained models (BERT, RoBERTa, XLNet) and competitive non-pretrained models (InferSent, DAM, ESIM). Apart from the benchmark and baselines, we also investigate two debiasing approaches which exploit the artificial pattern modeling to mitigate such hypothesis-only bias: down-sampling and adversarial training. We believe those methods can be treated as competitive baselines in NLI debiasing tasks.
翻译:最近的许多研究显示,对于在自然语言推断数据集方面受过培训的模型,仅看假设即可作出正确的预测,而完全无视前提。在这项工作中,我们设法从仅假设的偏差中得出对抗性例子,并探索减轻这种偏差的合格方法。具体地说,我们从训练成套单元的假设(人工模式)中提取了各种短语,并表明它们是具体标签的有力指标。然后,我们从最初的测试组中找出“硬性”和“容易”的例子,其标签与这些迹象相反或一致。我们还建立了基线,包括预先训练的模型(BERT、ROBERTA、XLNet)和竞争性的非训练型模型(InferSent、DAM、ESIM),除了基准和基线外,我们还调查了两种偏差性方法,这些方法利用人为模式模型模型来减轻这种假设性的偏差:降样和对抗性训练。我们认为,这些方法可以作为NLID的竞争性基线处理。