Deep neural networks have been shown to be vulnerable to adversarial attacks that perturb inputs based on semantic features. Existing robustness analyzers can reason about semantic feature neighborhoods to increase the networks' reliability. However, despite the significant progress in these techniques, they still struggle to scale to deep networks and large neighborhoods. In this work, we introduce VeeP, an active learning approach that splits the verification process into a series of smaller verification steps, each is submitted to an existing robustness analyzer. The key idea is to build on prior steps to predict the next optimal step. The optimal step is predicted by estimating the certification velocity and sensitivity via parametric regression. We evaluate VeeP on MNIST, Fashion-MNIST, CIFAR-10 and ImageNet and show that it can analyze neighborhoods of various features: brightness, contrast, hue, saturation, and lightness. We show that, on average, given a 90 minute timeout, VeeP verifies 96% of the maximally certifiable neighborhoods within 29 minutes, while existing splitting approaches verify, on average, 73% of the maximally certifiable neighborhoods within 58 minutes.
翻译:深心神经网络被证明很容易受到以语义特征为基础的干扰性输入的对抗性攻击。 现有的强力分析器可以解释语义特征特征社区的理由, 以提高网络的可靠性。 然而, 尽管这些技术取得了显著进步, 但是它们仍然在挣扎着向深网络和大邻近地区扩展。 在这项工作中, 我们引入了VeeP, 这是一种积极的学习方法, 将核查进程分成一系列较小的核查步骤, 每个进程都提交给一个现有的稳健性分析器。 关键的想法是, 在预测下一个最佳步骤的前步骤的基础上更进一步。 最佳步骤是通过参数回归来估计认证速度和敏感度。 我们评估MNIST、 Fashon- MNIST、 CIFFAR- 10 和图像网络的VeP, 并显示它能够分析各种特征的周边: 亮度、 对比、 光度、 饱和度和光度。 我们显示, 平均有90分钟的超时, VeeP在29分钟内验证96 % 的最高可认证区, 而现有的分裂方法平均在58分钟内核查73% 。