Numerous studies have demonstrated that deep neural networks are easily misled by adversarial examples. Effectively evaluating the adversarial robustness of a model is important for its deployment in practical applications. Currently, a common type of evaluation is to approximate the adversarial risk of a model as a robustness indicator by constructing malicious instances and executing attacks. Unfortunately, there is an error (gap) between the approximate value and the true value. Previous studies manually design attack methods to achieve a smaller error, which is inefficient and may miss a better solution. In this paper, we establish the tightening of the approximation error as an optimization problem and try to solve it with an algorithm. More specifically, we first analyze that replacing the non-convex and discontinuous 0-1 loss with a surrogate loss, a necessary compromise in calculating the approximation, is one of the main reasons for the error. Then we propose AutoLoss-AR, the first method for searching loss functions for tightening the approximation error of adversarial risk. Extensive experiments are conducted in multiple settings. The results demonstrate the effectiveness of the proposed method: the best-discovered loss functions outperform the handcrafted baseline by 0.9%-2.9% and 0.7%-2.0% on MNIST and CIFAR-10, respectively. Besides, we also verify that the searched losses can be transferred to other settings and explore why they are better than the baseline by visualizing the local loss landscape.
翻译:许多研究都表明,深心神经网络很容易被对抗性实例误导。 有效评估模型的对抗性强度对于在实际应用中应用模型很重要。 目前, 一种常见的评价类型是通过构建恶意事件和实施攻击来将模型的对抗性风险作为稳健性指标。 不幸的是, 估计值和真实值之间有一个错误( 差距 ) 。 先前的研究手工设计攻击方法, 以达到一个较小的错误, 效率低, 可能错过更好的解决办法。 在本文中, 我们将近似错误作为优化问题加以收紧, 并尝试用算法来解决这个问题。 更具体地说, 我们首先分析用一种代谢损失取代非电解和终止性0-1损失的对抗性风险风险风险。 这是错误的主要原因之一。 然后我们提出AutLOs- AR, 搜索损失函数的第一个方法, 以缩小对抗性对抗性风险的近似误差。 大规模实验是在多个环境中进行。 结果表明, 最佳解析的损失功能超越了手写性损失函数, 以代谢损失为0. 0. 0. 9 % 和 mAR_ 。 分别搜索了本地损失 。