We focus on the problem of adversarial attacks against models on discrete sequential data in the black-box setting where the attacker aims to craft adversarial examples with limited query access to the victim model. Existing black-box attacks, mostly based on greedy algorithms, find adversarial examples using pre-computed key positions to perturb, which severely limits the search space and might result in suboptimal solutions. To this end, we propose a query-efficient black-box attack using Bayesian optimization, which dynamically computes important positions using an automatic relevance determination (ARD) categorical kernel. We introduce block decomposition and history subsampling techniques to improve the scalability of Bayesian optimization when an input sequence becomes long. Moreover, we develop a post-optimization algorithm that finds adversarial examples with smaller perturbation size. Experiments on natural language and protein classification tasks demonstrate that our method consistently achieves higher attack success rate with significant reduction in query count and modification rate compared to the previous state-of-the-art methods.
翻译:我们关注黑箱设置中针对离散相继数据模型的对抗性攻击问题,攻击者的目的是在黑箱设置对抗性例子,对受害者模型的查询机会有限。现有的黑箱攻击大多以贪婪的算法为基础,发现使用预先计算的关键位置进行干扰的敌对性例子,这严重限制了搜索空间,并可能导致不理想的解决办法。为此,我们提议使用巴伊西亚优化方法进行有查询效率的黑箱攻击,用自动相关性确定(ARD)绝对内核对重要位置进行动态计算。我们引入了块分解和历史分采样技术,以便在输入序列变长时改进巴伊西亚优化的可缩缩放性。此外,我们还开发了一种后优化性算法,发现对立性例子的扰动尺寸较小。关于自然语言和蛋白质分类任务的实验表明,我们的方法始终在提高攻击成功率,与以往的先进方法相比,查询计数率和修改率显著下降。