The design of additive imperceptible perturbations to the inputs of deep classifiers to maximize their misclassification rates is a central focus of adversarial machine learning. An alternative approach is to synthesize adversarial examples from scratch using GAN-like structures, albeit with the use of large amounts of training data. By contrast, this paper considers one-shot synthesis of adversarial examples; the inputs are synthesized from scratch to induce arbitrary soft predictions at the output of pre-trained models, while simultaneously maintaining high similarity to specified inputs. To this end, we present a problem that encodes objectives on the distance between the desired and output distributions of the trained model and the similarity between such inputs and the synthesized examples. We prove that the formulated problem is NP-complete. Then, we advance a generative approach to the solution in which the adversarial examples are obtained as the output of a generative network whose parameters are iteratively updated by optimizing surrogate loss functions for the dual-objective. We demonstrate the generality and versatility of the framework and approach proposed through applications to the design of targeted adversarial attacks, generation of decision boundary samples, and synthesis of low confidence classification inputs. The approach is further extended to an ensemble of models with different soft output specifications. The experimental results verify that the targeted and confidence reduction attack methods developed perform on par with state-of-the-art algorithms.
翻译:设计与深层分类员投入的添加式不易察觉的扰动,以最大限度地降低其错误分类率,这是对抗性机器学习的一个中心焦点。另一种办法是使用大量培训数据,利用类似于GAN的结构,综合从头到尾的对抗性实例,尽管使用大量培训数据。与此相反,本文件考虑对敌对性实例进行一次性合成;从头到尾对投入进行合成,以便在经过培训的模型的产出中引起任意的软预测,同时保持与特定投入的高度相似性。为此,我们提出了一个问题,将目标与经过培训的模型的预期和产出分布之间的距离以及此类投入与综合实例之间的相似性加以编码。我们证明,所提出的问题已经形成为PNP的完整。随后,我们推进了一种基因化方法,即通过优化双重目标的代谢性损失功能,对参数进行迭代更新。我们通过应用设计有针对性的对抗性攻击、生成软性边界抽样、用软性标准对低信任性数据进行进一步分析,从而将低信任性数据转换成一个可靠的模型。