Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if crowdsourced and is computationally expensive to extend to new perturbation types if generated using supervised methods. To address this, we introduce a new framework called DISCO for automatically generating high-quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters the generation to distill high-quality counterfactual data. We show that learning with this counterfactual data yields a comparatively small student model that is 6% (absolute) more robust and generalizes 5% better across distributions than baselines on various challenging evaluations. This model is also 15% more sensitive in differentiating original and counterfactual examples, on three evaluation sets written by human workers and via human-AI collaboration.
翻译:最近的方法表明,利用反事实知识增强数据可以教导模型,说明任务的因果结构,从而形成稳健和可概括的模型。然而,如果由多方来源提供,这种反事实数据往往规模和多样性有限,如果使用监督方法生成,则计算成本昂贵,可以扩展到新的扰动类型。为了解决这个问题,我们引入了一个称为DISCO的新框架,以自动生成规模的高质量反事实数据。DISCO工程师用一个大型通用语言模型来推动生成角扰动。然后,一个具体任务教师模型过滤生成的生成,以蒸馏高质量的反事实数据。我们表明,用这种反事实数据学习产生一个相对较小的学生模型,比各种挑战性评价的基线更加可靠,比6%(绝对)强,而且一般化5%。这个模型在区分原始和反事实实例方面,在由人类工作者和通过人类-AI合作编写的三套评价组上,也更敏感15%。