Certifiers for neural networks have made great progress towards provable robustness guarantees against evasion attacks using adversarial examples. However, introducing certifiers into deep learning systems also opens up new attack vectors, which need to be considered before deployment. In this work, we conduct the first systematic analysis of training time attacks against certifiers in practical application pipelines, identifying new threat vectors that can be exploited to degrade the overall system. Using these insights, we design two backdoor attacks against network certifiers, which can drastically reduce certified robustness when the backdoor is activated. For example, adding 1% poisoned data points during training is sufficient to reduce certified robustness by up to 95 percentage points, effectively rendering the certifier useless. We analyze how such novel attacks can compromise the overall system's integrity or availability. Our extensive experiments across multiple datasets, model architectures, and certifiers demonstrate the wide applicability of these attacks. A first investigation into potential defenses shows that current approaches only partially mitigate the issue, highlighting the need for new, more specific solutions.
翻译:神经网络的验证人已经取得了很大进展, 利用这些洞察力, 我们设计了两起针对网络验证人的后门攻击, 这可以大幅降低在后门启动时经认证的可靠程度。 例如, 在培训期间增加1%的有毒数据点, 足以将经认证的可靠程度降低95个百分点, 有效地使验证人变得无用。 我们分析这些新颖的攻击如何会损害整个系统的完整性或可用性。 我们通过多种数据集、模型结构以及验证人进行的广泛实验, 证明了这些攻击的广泛适用性。 对潜在防御的第一次调查显示, 目前的办法只能部分缓解问题, 凸显出新的、 更具体的解决办法的必要性 。