Machine learning models are critically susceptible to evasion attacks from adversarial examples. Generally, adversarial examples, modified inputs deceptively similar to the original input, are constructed under whitebox settings by adversaries with full access to the model. However, recent attacks have shown a remarkable reduction in query numbers to craft adversarial examples using blackbox attacks. Particularly, alarming is the ability to exploit the classification decision from the access interface of a trained model provided by a growing number of Machine Learning as a Service providers including Google, Microsoft, IBM and used by a plethora of applications incorporating these models. The ability of an adversary to exploit only the predicted label from a model to craft adversarial examples is distinguished as a decision-based attack. In our study, we first deep dive into recent state-of-the-art decision-based attacks in ICLR and SP to highlight the costly nature of discovering low distortion adversarial employing gradient estimation methods. We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients seen in gradient estimation methods. The attack method we propose, RamBoAttack, exploits the notion of Randomized Block Coordinate Descent to explore the hidden classifier manifold, targeting perturbations to manipulate only localized input features to address the issues of gradient estimation methods. Importantly, the RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class. Overall, for a given target class, RamBoAttack is demonstrated to be more robust at achieving a lower distortion within a given query budget. We curate our extensive results using the large-scale high-resolution ImageNet dataset and open-source our attack, test samples and artifacts on GitHub.
翻译:机器学习模型极易受到对抗样本的攻击,这些样本经过修改后与原始输入极为相似,并能通过欺骗的方式来迷惑模型。一般来说,攻击者需要在白盒攻击环境下具备对模型的完全访问权限才能构造出这些对抗样本。然而,最近的攻击表明,黑盒攻击的查询数量已显著减少,其能够利用来自越来越多的机器学习服务提供商的训练模型接入接口实现对分类决策的攻击,包括谷歌、微软和 IBM,并被许多应用程序调用。对于只能利用模型的预测标签来生成对抗样本的攻击,我们将其称为基于决策的攻击。在我们的研究中,我们首先深入分析了最近的 ICLR 和 SP 中的基于决策的最新攻击方法,以突显使用梯度估计方法发现低扭曲形式的对抗性例子的成本问题。我们开发了一种查询高效的强大攻击方法 RamBoAttack,能够避免落入局部最小值和受到梯度估计方法中存在的噪声梯度的误导。我们提出的攻击方法,RamBoAttack 利用随机块坐标下降的思路来探索隐藏的分类器空间,将目标扰动局限于操纵局部输入特征,以解决梯度估计方法的问题。重要的是,RamBoAttack 更加稳健,可以适应攻击者使用的不同样本输入和目标类别。总的来说,对于给定的目标类别,RamBoAttack 在给定的查询预算内能够实现更低的扭曲度。我们在大规模高分辨率的 ImageNet 数据集上整理了广泛的实验结果,并在 GitHub 上开源了我们的攻击、测试样例和文献。