As we seek to deploy machine learning models beyond virtual and controlled domains, it is critical to analyze not only the accuracy or the fact that it works most of the time, but if such a model is truly robust and reliable. This paper studies strategies to implement adversary robustly trained algorithms towards guaranteeing safety in machine learning algorithms. We provide a taxonomy to classify adversarial attacks and defenses, formulate the Robust Optimization problem in a min-max setting and divide it into 3 subcategories, namely: Adversarial (re)Training, Regularization Approach, and Certified Defenses. We survey the most recent and important results in adversarial example generation, defense mechanisms with adversarial (re)Training as their main defense against perturbations. We also survey mothods that add regularization terms that change the behavior of the gradient, making it harder for attackers to achieve their objective. Alternatively, we've surveyed methods which formally derive certificates of robustness by exactly solving the optimization problem or by approximations using upper or lower bounds. In addition, we discuss the challenges faced by most of the recent algorithms presenting future research perspectives.
翻译:当我们试图在虚拟和控制领域之外部署机器学习模型时,不仅必须分析其大部分时间工作的准确性或事实,而且必须分析这种模型是否真正可靠和可靠。本文研究的是执行竞争对手强力训练算法的战略,以保证机器学习算法的安全。我们提供了一种分类学,对对抗性攻击和防御进行分类,在微量轴设置中制定强力优化问题,并将其分为三个子类,即:反向(再)训练、正规化方法和注册国防。我们调查了对抗性例子生成的最新和重要结果,用对抗性(再)训练的防御机制作为对抗扰动的主要防御手段。我们还调查了能够改变梯度行为、使攻击者更难于实现其目标的正规化术语。或者,我们调查了通过精确解决优化问题或使用上下限或下限的校准来正式获得强力证明的方法。此外,我们讨论了最近提出未来研究观点的多数算法所面临的挑战。