Adversarial training for neural networks has been in the limelight in recent years. The advancement in neural network architectures over the last decade has led to significant improvement in their performance. It sparked an interest in their deployment for real-time applications. This process initiated the need to understand the vulnerability of these models to adversarial attacks. It is instrumental in designing models that are robust against adversaries. Recent works have proposed novel techniques to counter the adversaries, most often sacrificing natural accuracy. Most suggest training with an adversarial version of the inputs, constantly moving away from the original distribution. The focus of our work is to use abstract certification to extract a subset of inputs for (hence we call it 'soft') adversarial training. We propose a training framework that can retain natural accuracy without sacrificing robustness in a constrained setting. Our framework specifically targets moderately critical applications which require a reasonable balance between robustness and accuracy. The results testify to the idea of soft adversarial training for the defense against adversarial attacks. At last, we propose the scope of future work for further improvement of this framework.
翻译:近些年来,神经网络的Adversari培训一直受到关注。神经网络结构在过去十年中的进展导致其性能的显著改善。它引起了对实时应用部署这些模型的兴趣。这个过程使得人们有必要了解这些模型的脆弱性,以便了解这些模型对对抗性攻击的脆弱性。它有助于设计对对手具有强力的模型。最近的工作提出了对付对手的新颖技术,最经常是牺牲自然准确性。大多数工作都建议用对立版本的投入进行训练,不断偏离最初的分布。我们的工作重点是利用抽象认证来提取一部分投入(我们称之为“软性”)对抗性训练。我们提出了一个培训框架,可以保持自然的准确性,同时又不牺牲制约环境的坚固性。我们的框架具体针对的是那些需要在强力和准确性之间取得合理平衡的中度关键应用。结果证明了防御性防御性攻击性攻击性防御性软性对抗性训练的想法。最后,我们提出了今后进一步改进这一框架的工作范围。