Recently more attention has been given to adversarial attacks on neural networks for natural language processing (NLP). A central research topic has been the investigation of search algorithms and search constraints, accompanied by benchmark algorithms and tasks. We implement an algorithm inspired by zeroth order optimization-based attacks and compare with the benchmark results in the TextAttack framework. Surprisingly, we find that optimization-based methods do not yield any improvement in a constrained setup and slightly benefit from approximate gradient information only in unconstrained setups where search spaces are larger. In contrast, simple heuristics exploiting nearest neighbors without querying the target function yield substantial success rates in constrained setups, and nearly full success rate in unconstrained setups, at an order of magnitude fewer queries. We conclude from these results that current TextAttack benchmark tasks are too easy and constraints are too strict, preventing meaningful research on black-box adversarial text attacks.
翻译:最近,对自然语言处理神经网络(NLP)的对抗性攻击得到了更多的关注。一个中心研究课题是调查搜索算法和搜索限制,并辅以基准算法和任务。我们实施了由零顺序优化攻击启发的算法,并与TextAtack框架的基准结果进行比较。令人惊讶的是,我们发现,基于优化的方法在限制的设置方面没有带来任何改进,仅在搜索空间较大的不受限制的设置中略微受益于粗略的梯度信息。相比之下,简单的惯性利用近邻而不询问目标功能,在受限制的设置中取得了显著的成功率,在不受限制的设置中几乎完全成功率,在数量上较少的查询。我们从这些结果中得出结论,目前的TextAtack基准任务过于容易,限制太严格,无法对黑盒对抗文本攻击进行有意义的研究。