Machine learning models are vulnerable to adversarial attacks. One approach to addressing this vulnerability is certification, which focuses on models that are guaranteed to be robust for a given perturbation size. A drawback of recent certified models is that they are stochastic: they require multiple computationally expensive model evaluations with random noise added to a given input. In our work, we present a deterministic certification approach which results in a certifiably robust model. This approach is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We achieve certified models on ImageNet-1k by retraining a model with this loss for one epoch without the use of label information.
翻译:机器学习模式很容易受到对抗性攻击。 解决这种脆弱性的一个办法是认证,其重点是保证在特定扰动大小方面稳健的模型。 近期认证模型的一个缺点是:它们具有随机性:它们需要多种计算成本昂贵的模型评估,在给定投入中添加随机噪音。 在我们的工作中,我们提出了一个确定性认证方法,从而形成一个可以验证的稳健模型。 这种方法基于培训与特定常规损失之间的等值,以及高斯海平均值的预期值。 我们通过在不使用标签信息的情况下对一个时代损失的模型进行再培训,从而在图像Net-1k上实现认证模式。