Vision Transformer (ViT) is known to be highly nonlinear like other classical neural networks and could be easily fooled by both natural and adversarial patch perturbations. This limitation could pose a threat to the deployment of ViT in the real industrial environment, especially in safety-critical scenarios. In this work, we propose PatchCensor, aiming to certify the patch robustness of ViT by applying exhaustive testing. We try to provide a provable guarantee by considering the worst patch attack scenarios. Unlike empirical defenses against adversarial patches that may be adaptively breached, certified robust approaches can provide a certified accuracy against arbitrary attacks under certain conditions. However, existing robustness certifications are mostly based on robust training, which often requires substantial training efforts and the sacrifice of model performance on normal samples. To bridge the gap, PatchCensor seeks to improve the robustness of the whole system by detecting abnormal inputs instead of training a robust model and asking it to give reliable results for every input, which may inevitably compromise accuracy. Specifically, each input is tested by voting over multiple inferences with different mutated attention masks, where at least one inference is guaranteed to exclude the abnormal patch. This can be seen as complete-coverage testing, which could provide a statistical guarantee on inference at the test time. Our comprehensive evaluation demonstrates that PatchCensor is able to achieve high certified accuracy (e.g. 67.1% on ImageNet for 2%-pixel adversarial patches), significantly outperforming state-of-the-art techniques while achieving similar clean accuracy (81.8% on ImageNet). Meanwhile, our technique also supports flexible configurations to handle different adversarial patch sizes (up to 25%) by simply changing the masking strategy.
翻译:视觉变压器(ViT)像其他经典神经网络一样具有高度的非线性性,并且可能很容易被自然和敌对的补丁扰动所欺骗。这种限制可能对ViT在实际工业环境中的部署构成威胁,特别是在安全关键场景下。在这项工作中,我们提出了PatchCensor,旨在通过进行全面测试来认证ViT的补丁鲁棒性。我们试图通过考虑最坏的补丁攻击方案来提供可证明的保证。与针对敌对补丁的经验性防御方法不同,认证鲁棒性方法可以在某些条件下对任意攻击提供认证精度。然而,现有的鲁棒认证大多基于鲁棒性训练,这往往需要大量的训练精力和模型在正常样本上性能的牺牲。为了弥补这一差距,PatchCensor试图通过检测异常输入而不是训练一个强大的模型来提高整个系统的鲁棒性,要求其对每个输入都能给出可靠的结果,这可能必然会影响准确性。具体而言,每个输入将通过对多个突变注意掩码进行投票的方法进行测试,其中至少有一个推理被保证可以排除异常补丁。这可以看作是完备覆盖测试,它可以在测试时间提供统计保证。我们的全面评估表明,PatchCensor能够实现高精度认证(例如在ImageNet上为2%像素敌对补丁的67.1%),在实现类似的干净精度(在ImageNet上为81.8%)的同时显著优于最先进的技术。与此同时,我们的技术还支持灵活的配置,以处理不同的敌对补丁大小(高达25%),只需改变掩模策略即可。