Adversarial examples are helpful for analyzing and improving the robustness of text classifiers. Generating high-quality adversarial examples is a challenging task as it requires generating fluent adversarial sentences that are semantically similar to the original sentences and preserve the original labels, while causing the classifier to misclassify them. Existing methods prioritize misclassification by maximizing each perturbation's effectiveness at misleading a text classifier; thus, the generated adversarial examples fall short in terms of fluency and similarity. In this paper, we propose a rewrite and rollback (R&R) framework for adversarial attack. It improves the quality of adversarial examples by optimizing a critique score which combines the fluency, similarity, and misclassification metrics. R&R generates high-quality adversarial examples by allowing exploration of perturbations that do not have immediate impact on the misclassification metric but can improve fluency and similarity metrics. We evaluate our method on 5 representative datasets and 3 classifier architectures. Our method outperforms current state-of-the-art in attack success rate by +16.2%, +12.8%, and +14.0% on the classifiers respectively. Code is available at https://github.com/DAI-Lab/fibber
翻译:反向示例有助于分析和提高文字分类的稳健性。 生成高质量的对抗性实例是一项艰巨的任务,因为它要求生成流畅的对抗性抗辩性判决,这些判决在语言上与原句相似,并保存原有标签,同时导致分类者对其分类错误。 现有方法通过在误导文本分类者时最大限度地提高每次扰动的分级效力,将错误分类列为优先事项; 因此, 生成的对抗性实例在流利性和相似性方面都存在缺陷。 我们在本文件中提议为对抗性攻击重新撰写和滚动框架(R & R), 因为它要求通过优化将流利、相似性和错误分类标准结合起来的批评性评分提高对抗性实例的质量。 R&R通过允许探索不会立即影响文字分类标准但能改善流利性和相似度的扰动性实例,从而产生高质量的对抗性实例。 我们在5个有代表性的数据集和3个分类性架构上评估了我们的方法。 我们的方法在攻击性攻击率中超越了当前州- 的艺术, 在攻击率中,TRAI1 和AB1 成功率中分别由 +16% MAFIB1 和RVIB1 分别取代。