Recent work finds modern natural language processing (NLP) models relying on spurious features for prediction. Mitigating such effects is thus important. Despite this need, there is no quantitative measure to evaluate or compare the effects of different forms of spurious features in NLP. We address this gap in the literature by quantifying model sensitivity to spurious features with a causal estimand, dubbed CENT, which draws on the concept of average treatment effect from the causality literature. By conducting simulations with four prominent NLP models -- TextRNN, BERT, RoBERTa and XLNet -- we rank the models against their sensitivity to artificial injections of eight spurious features. We further hypothesize and validate that models that are more sensitive to a spurious feature will be less robust against perturbations with this feature during inference. Conversely, data augmentation with this feature improves robustness to similar perturbations. We find statistically significant inverse correlations between sensitivity and robustness, providing empirical support for our hypothesis.
翻译:最近的工作发现,现代自然语言处理(NLP)模式依赖于虚假的预测特征。因此,减轻这种影响非常重要。尽管如此,没有定量措施来评价或比较NLP中不同形式虚假特征的影响。我们通过量化模型敏感性来弥补文献中的这一差距,将模型敏感性量化为因果悬殊的、称为CENT的虚假特征,它借鉴了因果文献的平均处理效果概念。通过对四个突出的NLP模型 -- -- TextRNN、BERT、RoBERTA和XLNet -- -- 进行模拟,我们对这些模型进行评级,以对比其对八个虚假特征的人工注射的敏感性。我们进一步虚度和验证,对于一个虚假特征比较敏感的模型,在推断过程中,对这个特征的干扰将不那么强烈。相反,利用这一特征进行的数据增强会提高类似扰动的强度。我们发现,在统计上具有显著的反向相关性,为我们的假设提供经验支持。