多功能:为解释、评估和改进模型产生反事实 (Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models)

While counterfactual examples are useful for analysis and training of NLP models, current generation methods either rely on manual labor to create very few counterfactuals, or only instantiate limited types of perturbations such as paraphrases or word substitutions. We present Polyjuice, a general-purpose counterfactual generator that allows for control over perturbation types and locations, trained by finetuning GPT-2 on multiple datasets of paired sentences. We show that Polyjuice produces diverse sets of realistic counterfactuals, which in turn are useful in various distinct applications: improving training and evaluation on three different tasks (with around 70% less annotation effort than manual generation), augmenting state-of-the-art explanation techniques, and supporting systematic counterfactual error analysis by revealing behaviors easily missed by human experts.

翻译：虽然反事实例子对分析和培训NLP模式有用,但当代方法要么依靠体力劳动来创造很少的反事实,要么只是即时处理有限的扰动类型,如副词句或换词。我们展示了多功能反事实生成器,即一种通用反事实生成器,能够控制扰动类型和地点,经过GPT-2关于对称判刑的多个数据集的微调培训。我们显示,多功能生成了多种现实反事实,而这又在各种不同的应用中有用:改进三种不同任务的培训和评价(比人工生成少约70%的注解努力),增强最新解释技术,并通过揭示人类专家容易忽略的行为来支持系统性反事实错误分析。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/