Many existing deep learning models are vulnerable to adversarial examples that are imperceptible to humans. To address this issue, various methods have been proposed to design network architectures that are robust to one particular type of adversarial attacks. It is practically impossible, however, to predict beforehand which type of attacks a machine learn model may suffer from. To address this challenge, we propose to search for deep neural architectures that are robust to five types of well-known adversarial attacks using a multi-objective evolutionary algorithm. To reduce the computational cost, a normalized error rate of a randomly chosen attack is calculated as the robustness for each newly generated neural architecture at each generation. All non-dominated network architectures obtained by the proposed method are then fully trained against randomly chosen adversarial attacks and tested on two widely used datasets. Our experimental results demonstrate the superiority of optimized neural architectures found by the proposed approach over state-of-the-art networks that are widely used in the literature in terms of the classification accuracy under different adversarial attacks.
翻译:许多现有的深层次学习模式容易成为人类无法察觉的敌对例子。为了解决这一问题,已提出各种方法来设计对一种特定类型的对抗性攻击具有活力的网络结构。然而,几乎不可能事先预测机器学习模式可能遭受哪类攻击。为了应对这一挑战,我们提议使用多目标进化算法,寻找对五类众所周知的对抗性攻击具有活力的深层神经结构。为了降低计算成本,随机选择的攻击的正常误差率被计算为每一代新产生的神经结构的稳健性。然后,对通过拟议方法获得的所有非主导性网络结构进行充分培训,防止随机选择的对抗性攻击,并在两个广泛使用的数据集上进行测试。我们的实验结果表明,拟议方法发现的最佳神经结构优于在文献中广泛使用的状态网络,即不同对抗性攻击的分类准确性。