There has been a rapid development and interest in adversarial training and defenses in the machine learning community in the recent years. One line of research focuses on improving the performance and efficiency of adversarial robustness certificates for neural networks \cite{gowal:19, wong_zico:18, raghunathan:18, WengTowardsFC:18, wong:scalable:18, singh:convex_barrier:19, Huang_etal:19, single-neuron-relax:20, Zhang2020TowardsSA}. While each providing a certification to lower (or upper) bound the true distortion under adversarial attacks via relaxation, less studied was the tightness of relaxation. In this paper, we analyze a family of linear outer approximation based certificate methods via a meta algorithm, IBP-Lin. The aforementioned works often lack quantitative analysis to answer questions such as how does the performance of the certificate method depend on the network configuration and the choice of approximation parameters. Under our framework, we make a first attempt at answering these questions, which reveals that the tightness of linear approximation based certification can depend heavily on the configuration of the trained networks.
翻译:近些年来,在机器学习界的对抗性培训和防御方面出现了迅速的发展和兴趣。一线研究侧重于提高神经网络(cite{gowal:19, wang_zico:18, raghunathan:18,Weng formsFC:18, wang:可扩展性:18, sing: singh:convex_barrier:19, Huang_etal:19, 单中子放松:20,Zhang2020 UndersSA})的对抗性强力测试的性能和效率。每条提供低至下(或上)的认证,通过放松将对抗性攻击的真正扭曲捆绑起来,但研究较少的是放松的紧紧性。在本文中,我们通过元算法IBP-Lin分析了以线外近似性证书方法为基础的系列。上述工作往往缺乏定量分析来解答问题,例如证书方法的性能如何取决于网络配置和近似参数的选择。在我们的框架下,我们第一次尝试回答这些问题,这表明线性近似近似验证的近似准性验证是否严重取决于经过训练的网络的配置。