While state-of-the-art NLP models have been achieving the excellent performance of a wide range of tasks in recent years, important questions are being raised about their robustness and their underlying sensitivity to systematic biases that may exist in their training and test data. Such issues come to be manifest in performance problems when faced with out-of-distribution data in the field. One recent solution has been to use counterfactually augmented datasets in order to reduce any reliance on spurious patterns that may exist in the original data. Producing high-quality augmented data can be costly and time-consuming as it usually needs to involve human feedback and crowdsourcing efforts. In this work, we propose an alternative by describing and evaluating an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance when compared to models training on the original data and even when compared to models trained with the benefit of human-generated augmented data.
翻译:近些年来,尽管最先进的国家劳工政策模型取得了出色地完成一系列广泛任务,但人们正在对其稳健性和对培训和测试数据中可能存在的系统偏差的潜在敏感性提出重要问题,这些问题表现在面对外地分配数据外的绩效问题中。最近的一个解决办法是使用反实际扩大的数据集,以减少对原始数据中可能存在的虚假模式的依赖。产生高质量的扩充数据可能费用高昂,耗费时间,因为通常需要人类反馈和众包努力。在这项工作中,我们提出一种替代办法,说明和评价自动生成反事实数据的方法,用于数据扩充和解释。对若干不同的数据集进行全面评价,并采用各种最新基准,表明与原始数据模型培训相比,我们的方法如何能够在模型业绩方面实现重大改进,即使与经过培训的模型相比,利用人类生成的扩大数据的好处。