In spite of the enormous success of neural networks, adversarial examples remain a relatively weakly understood feature of deep learning systems. There is a considerable effort in both building more powerful adversarial attacks and designing methods to counter the effects of adversarial examples. We propose a method to transform the adversarial input data through a mixture of generators in order to recover the correct class obfuscated by the adversarial attack. A canonical set of images is used to generate adversarial examples through potentially multiple attacks. Such transformed images are processed by a set of generators, which are trained adversarially as a whole to compete in inverting the initial transformations. To our knowledge, this is the first use of a mixture-based adversarially trained system as a defense mechanism. We show that it is possible to train such a system without supervision, simultaneously on multiple adversarial attacks. Our system is able to recover class information for previously-unseen examples with neither attack nor data labels on the MNIST dataset. The results demonstrate that this multi-attack approach is competitive with adversarial defenses tested in single-attack settings.
翻译:尽管神经网络取得了巨大成功,但对抗性实例仍然是深层次学习系统的相对不易理解的特征。在建立更强大的对抗性攻击和设计对抗性例子影响的方法方面,都作出了相当大的努力。我们提出一种方法,通过混合发电机转换对抗性输入数据,以恢复因对抗性攻击而出现的正确的等级混淆。一套典型图像被用来通过潜在的多重攻击产生对抗性例子。这些变形图像是由一组发电机处理的,这些发电机经过对抗性培训,整体上可进行对抗性竞争,以扭转最初的转变。据我们所知,这是首次使用基于混合对抗性训练的防御性系统作为防御机制。我们表明,在没有监督的情况下培训这种系统是可能的,同时进行多重对抗性攻击性攻击。我们的系统能够从以往所见的既无攻击又无数据标签的例子中恢复类信息。结果表明,这种多攻击方法与在单一攻击环境下测试的对抗性防御具有竞争力。