Despite that deep neural networks (DNNs) have achieved enormous success in many domains like natural language processing (NLP), they have also been proven to be vulnerable to maliciously generated adversarial examples. Such inherent vulnerability has threatened various real-world deployed DNNs-based applications. To strength the model robustness, several countermeasures have been proposed in the English NLP domain and obtained satisfactory performance. However, due to the unique language properties of Chinese, it is not trivial to extend existing defenses to the Chinese domain. Therefore, we propose AdvGraph, a novel defense which enhances the robustness of Chinese-based NLP models by incorporating adversarial knowledge into the semantic representation of the input. Extensive experiments on two real-world tasks show that AdvGraph exhibits better performance compared with previous work: (i) effective - it significantly strengthens the model robustness even under the adaptive attacks setting without negative impact on model performance over legitimate input; (ii) generic - its key component, i.e., the representation of connotative adversarial knowledge is task-agnostic, which can be reused in any Chinese-based NLP models without retraining; and (iii) efficient - it is a light-weight defense with sub-linear computational complexity, which can guarantee the efficiency required in practical scenarios.
翻译:尽管在自然语言处理(NLP)等许多领域深层神经网络(DNNS)取得了巨大成功,但事实证明,这些神经网络在自然语言处理(NLP)等许多领域都取得了巨大成功,但它们也很容易受到恶意生成的对抗性实例的影响,这种固有的脆弱性威胁了各种实际部署的DNNS应用程序。为了加强模型的稳健性,在英国NLP域提出了几项对策,并取得了令人满意的表现。然而,由于中文独特的语言特性,将现有防御扩展到中国域并非微不足道。因此,我们提议AdvGraph是一种新颖的辩护,通过将对抗性知识纳入投入的语义表述来增强中以中文为基础的NLP模型的稳健性。关于两种现实性任务的广泛实验表明,AdvGraph与以前的工作相比表现得更好:(一)有效——即使在适应性攻击对模型的性能没有超过合法投入的负影响的情况下,也大大加强了模型的稳健性。 (二)其关键组成部分是共性对抗性对抗性知识的体现是任务性,可以在任何基于中国的防御性强度模型中重新加以利用,而不需要的精度的精度的精度的精度计算(三)在任何实际的精度的精度的精度的精度的精度的精度的精度的国防模型中,这种精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度可度的精确度的精确度。