由此推算出神经NLP模型对净化地物的灵敏度 (Causally Estimating the Sensitivity of Neural NLP Models to Spurious Features)

Recent work finds modern natural language processing (NLP) models relying on spurious features for prediction. Mitigating such effects is thus important. Despite this need, there is no quantitative measure to evaluate or compare the effects of different forms of spurious features in NLP. We address this gap in the literature by quantifying model sensitivity to spurious features with a causal estimand, dubbed CENT, which draws on the concept of average treatment effect from the causality literature. By conducting simulations with four prominent NLP models -- TextRNN, BERT, RoBERTa and XLNet -- we rank the models against their sensitivity to artificial injections of eight spurious features. We further hypothesize and validate that models that are more sensitive to a spurious feature will be less robust against perturbations with this feature during inference. Conversely, data augmentation with this feature improves robustness to similar perturbations. We find statistically significant inverse correlations between sensitivity and robustness, providing empirical support for our hypothesis.

翻译：最近的工作发现,现代自然语言处理(NLP)模式依赖于虚假的预测特征。因此,减轻这种影响非常重要。尽管如此,没有定量措施来评价或比较NLP中不同形式虚假特征的影响。我们通过量化模型敏感性来弥补文献中的这一差距,将模型敏感性量化为因果悬殊的、称为CENT的虚假特征,它借鉴了因果文献的平均处理效果概念。通过对四个突出的NLP模型 -- -- TextRNN、BERT、RoBERTA和XLNet -- -- 进行模拟,我们对这些模型进行评级,以对比其对八个虚假特征的人工注射的敏感性。我们进一步虚度和验证,对于一个虚假特征比较敏感的模型,在推断过程中,对这个特征的干扰将不那么强烈。相反,利用这一特征进行的数据增强会提高类似扰动的强度。我们发现,在统计上具有显著的反向相关性,为我们的假设提供经验支持。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/