For text classification tasks, finetuned language models perform remarkably well. Yet, they tend to rely on spurious patterns in training data, thus limiting their performance on out-of-distribution (OOD) test data. Among recent models aiming to avoid this spurious pattern problem, adding extra counterfactual samples to the training data has proven to be very effective. Yet, counterfactual data generation is costly since it relies on human annotation. Thus, we propose a novel solution that only requires annotation of a small fraction (e.g., 1%) of the original training data, and uses automatic generation of extra counterfactuals in an encoding vector space. We demonstrate the effectiveness of our approach in sentiment classification, using IMDb data for training and other sets for OOD tests (i.e., Amazon, SemEval and Yelp). We achieve noticeable accuracy improvements by adding only 1% manual counterfactuals: +3% compared to adding +100% in-distribution training samples, +1.3% compared to alternate counterfactual approaches.
翻译:对于文本分类任务,经过微调的语言模型表现非常出色。然而,它们往往依赖培训数据中的虚假模式,从而限制其在分配外测试数据上的性能。在最近一些旨在避免这种虚假模式问题的模型中,在培训数据中添加额外的反事实样本证明非常有效。然而,反事实数据的生成成本高昂,因为它依赖于人类的注释。因此,我们提出了一个新颖的解决办法,只需要对原始培训数据中的一小部分(例如1%)进行批注,并在编码矢量空间中自动生成额外的反事实。我们展示了我们在情绪分类方面的做法的有效性,利用IMDb数据进行培训,并用其他数据集进行OOOD测试(例如亚马逊、SemEval和Yelp)。我们通过只增加1%的人工反事实来取得显著的准确性改进:+3%与增加分配培训样本中的+100%相比,+1.3%与替代反事实方法相比,我们实现了显著的准确性改进。