Deep learning models have achieved great success in many fields, yet they are vulnerable to adversarial examples. This paper follows a causal perspective to look into the adversarial vulnerability and proposes Causal Intervention by Semantic Smoothing (CISS), a novel framework towards robustness against natural language attacks. Instead of merely fitting observational data, CISS learns causal effects p(y|do(x)) by smoothing in the latent semantic space to make robust predictions, which scales to deep architectures and avoids tedious construction of noise customized for specific attacks. CISS is provably robust against word substitution attacks, as well as empirically robust even when perturbations are strengthened by unknown attack algorithms. For example, on YELP, CISS surpasses the runner-up by 6.7% in terms of certified robustness against word substitutions, and achieves 79.4% empirical robustness when syntactic attacks are integrated.
翻译:深层次的学习模式在许多领域取得了巨大成功,但是它们很容易受到对抗性例子的影响。本文件从因果角度审视对抗性脆弱性,并提议由语义滑动(CIS)进行精神干预,这是一个针对自然语言攻击的稳健性的新框架。 CISS不是仅仅通过适当的观察数据,而是通过在潜在的语义空间中平滑地作出稳健的预测来了解因果关系 p(y ⁇ do(x) ), 从而得出稳健的预测, 以至深层结构, 避免为特定攻击量定制的噪音。 CISS对单词替代攻击非常强大, 即使在未知的攻击算法强化了扰动性时, 也具有实证性强力。 例如,在YELP, CISS在经认证的稳健性反对单词替换方面超过了6.7%, 在合成攻击时实现了79.4%的实证稳健性。