Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually.
翻译:Adversarial 机器学习是最近一个重大关切和热门话题,特别是当前风景中广泛使用深神经网络。 反向攻击和防御通常被比作一个猫和猫游戏, 捍卫者和袭击者在一段时间里会演化。 一方面, 目标是发展强大和强大的深网络, 抵制恶意行为者。 另一方面, 为了实现这一目标, 我们需要设计更强大的对抗性攻击, 以挑战这些防御模式。 大多数现有攻击都使用单一的美元/ p$的距离( 普通的, $p\%1, 2,\ infty $ $ ) 。 反向攻击和防御性攻击定义近距离的概念, 并使用最剧烈的梯度游戏游戏游戏。 一方面, 以同样的方式更新一个对抗性的例子中的所有像素。 这些$\ell_ p$ 袭击都有其自己的亲和共性; 并且没有任何单一的攻击能够成功地通过防御性基准来打破多美元/ p$ 标准( 普通的, $\ p$, intell\\\ 2, inty $ $ $ $ $) 。 通过这些观察方法, 我们通过直观的每个直观预测, 学习一个直观的方法, 我们如何以直观的直观的直观的方式, 直观的方式, 直观的直观, 。