Alongside the well-publicized accomplishments of deep neural networks there has emerged an apparent bug in their success on tasks such as object recognition: with deep models trained using vanilla methods, input images can be slightly corrupted in order to modify output predictions, even when these corruptions are practically invisible. This apparent lack of robustness has led researchers to propose methods that can help to prevent an adversary from having such capabilities. The state-of-the-art approaches have incorporated the robustness requirement into the loss function, and the training process involves taking stochastic gradient descent steps not using original inputs but on adversarially-corrupted ones. In this paper we propose a multiclass boosting framework to ensure adversarial robustness. Boosting algorithms are generally well-suited for adversarial scenarios, as they were classically designed to satisfy a minimax guarantee. We provide a theoretical foundation for this methodology and describe conditions under which robustness can be achieved given a weak training oracle. We show empirically that adversarially-robust multiclass boosting not only outperforms the state-of-the-art methods, it does so at a fraction of the training time.
翻译:在深入的神经网络取得广为人知的成就的同时,在物体识别等任务的成功方面也出现了明显的错误:在使用香草方法培训的深模型中,输入图像可能会被略微腐蚀,以修改产出预测,即使这些腐败几乎是无形的。这种明显缺乏稳健性导致研究人员提出有助于防止对手拥有这种能力的方法。最先进的方法已经将稳健性要求纳入损失功能,而培训过程涉及采用随机性梯度梯度下降步骤,而不是使用原始投入,而是对立的梯度下降步骤。在本文中,我们提出了一个多级推进框架,以确保对抗性强健性。推算法通常适用于对抗性假设,因为典型地设计这些算法是为了满足微缩的保证。我们为这种方法提供了理论基础,并描述了能够实现稳健性的条件。我们从经验上表明,对抗性腐蚀性多级推进的多级梯度不仅超越了状态方法,而且超出了培训的一小部分时间。