Recent studies have shown the impressive efficacy of counterfactually augmented data (CAD) for reducing NLU models' reliance on spurious features and improving their generalizability. However, current methods still heavily rely on human efforts or task-specific designs to generate counterfactuals, thereby impeding CAD's applicability to a broad range of NLU tasks. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework. AutoCAD first leverages a classifier to unsupervisedly identify rationales as spans to be intervened, which disentangles spurious and causal features. Then, AutoCAD performs controllable generation enhanced by unlikelihood training to produce diverse counterfactuals. Extensive evaluations on multiple out-of-domain and challenge benchmarks demonstrate that AutoCAD consistently and significantly boosts the out-of-distribution performance of powerful pre-trained models across different NLU tasks, which is comparable or even better than previous state-of-the-art human-in-the-loop or task-specific CAD methods. The code is publicly available at https://github.com/thu-coai/AutoCAD.
翻译:最近的研究显示,反事实扩大的数据(CAD)在减少NLU模型依赖虚假特征和改善其一般性方面产生了令人印象深刻的效果,然而,目前的方法仍然严重依赖人的努力或任务特有设计来产生反事实,从而妨碍CAD对广泛的NLU任务的适用性。在本文件中,我们介绍AutoCAD,这是一个完全自动和任务性不可知的CAD生成框架。AutoCAD首先利用一个分类器,不受监督地确定干预范围的理由,这些理由会分解出虚假和因果关系特征。然后,AutoCADD进行可控的生成,通过不易懂的培训来产生不同的反事实。对多种外在域和质疑基准的广泛评价表明,AutoCADAD持续和显著地提高不同NLU任务中强大的预先训练模型的传播性表现,这种表现比以前最先进的人文内或任务专用CADAD方法可比甚至更好。该代码在https://giobth/ADUD/AD/ADUD/AD/AD/GUD/AD/AD/AD/AD/AD/AD/AD/GUT/GUT/AD/GOD/GUT/GUT/CT/AD/CT/CTTT/CT/CT/CT/CT/CTTTT/CT/G/DDDDT/DT/DT/G/G/G/CTTT/CT/CTT/DG/DG/CT/CT/D/D/D/DG/D/D/D/D/D/D/D/D/D/G/D/D/D/D/D/D/D/D/D/D/D/D/D/D/D/D/D/D/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/AD/