Textual adversarial attacks expose the vulnerabilities of text classifiers and can be used to improve their robustness. Existing context-aware methods solely consider the gold label probability and use the greedy search when searching an attack path, often limiting the attack efficiency. To tackle these issues, we propose PDBS, a context-aware textual adversarial attack model using Probability Difference guided Beam Search. The probability difference is an overall consideration of all class label probabilities, and PDBS uses it to guide the selection of attack paths. In addition, PDBS uses the beam search to find a successful attack path, thus avoiding suffering from limited search space. Extensive experiments and human evaluation demonstrate that PDBS outperforms previous best models in a series of evaluation metrics, especially bringing up to a +19.5% attack success rate. Ablation studies and qualitative analyses further confirm the efficiency of PDBS.
翻译:文本对抗性攻击暴露了文本分类者的弱点,并可用于提高它们的强度。现有的背景认知方法仅考虑金标签概率,在搜索攻击路径时使用贪婪搜索,往往限制攻击效率。为了解决这些问题,我们提议采用概率差异引导光束搜索,即背景认知文字对抗攻击模型PDBS,这是一种符合背景的文本对抗性攻击模型。概率差异是对所有类标签概率的总体考虑,而PDBS则使用它来指导攻击路径的选择。此外,PDBS利用光标搜索来寻找成功的攻击路径,从而避免有限的搜索空间造成的痛苦。广泛的实验和人类评估表明,PDBS在一系列评价指标中超越了以往的最佳模型,特别是将攻击成功率提高到+19.5%。吸收研究和定性分析进一步证实了PDBS的效率。