In this technical report, we evaluate the adversarial robustness of a very recent method called "Geometry-aware Instance-reweighted Adversarial Training"[7]. GAIRAT reports state-of-the-art results on defenses to adversarial attacks on the CIFAR-10 dataset. In fact, we find that a network trained with this method, while showing an improvement over regular adversarial training (AT), is biasing the model towards certain samples by re-scaling the loss. Indeed, this leads the model to be susceptible to attacks that scale the logits. The original model shows an accuracy of 59% under AutoAttack - when trained with additional data with pseudo-labels. We provide an analysis that shows the opposite. In particular, we craft a PGD attack multiplying the logits by a positive scalar that decreases the GAIRAT accuracy from from 55% to 44%, when trained solely on CIFAR-10. In this report, we rigorously evaluate the model and provide insights into the reasons behind the vulnerability of GAIRAT to this adversarial attack. The code to reproduce our evaluation is made available at https://github.com/giuxhub/GAIRAT-LSA
翻译:在这份技术报告中,我们评估了最近一种方法的对抗性强强性,该方法称为“地质测量法-有现效的对准加权反versarial培训”,[7]。GAIRAT报告说,在对CIFAR-10数据集的对抗性攻击进行防御性攻击方面,GAIRAT报告了最新的最新结果。事实上,我们发现,使用这种方法培训的网络在显示比常规对抗性训练(AT)有所改善的同时,将模型偏向于某些样本,对损失进行重新估量。事实上,这导致模型容易受到大规模对准的打击。最初模型显示AutoAttack下59%的精确度,在用伪标签补充数据进行训练时,我们提供了与此相反的分析。特别是,我们设计了一种PGD攻击法,将GAT的精确度从55%增加到44%,而仅对CIFAR-10进行训练时,将这种精确度从55%减少到44%。在本报告中,我们严格评估该模型,并提供了对GAIRAT对这场对抗性攻击的脆弱性背后的原因的洞察。在AIRAT/GAGAGAGUBUGUGUGUGUGAT/GUGUGUGUGUGUGUGUGUGUGUT/AT的代码上转载。