Certified patch defenses can guarantee robustness of an image classifier to arbitrary changes within a bounded contiguous region. But, currently, this robustness comes at a cost of degraded standard accuracies and slower inference times. We demonstrate how using vision transformers enables significantly better certified patch robustness that is also more computationally efficient and does not incur a substantial drop in standard accuracy. These improvements stem from the inherent ability of the vision transformer to gracefully handle largely masked images. Our code is available at https://github.com/MadryLab/smoothed-vit.
翻译:经认证的补丁防御可以保证图像分类器在相邻毗连区域内任意变化的稳健性。 但是,目前,这种稳健性的代价是以降低标准弧度和降低推论时间为代价的。 我们展示了使用视觉变压器如何能大大改善经认证的补丁稳健性,这种稳健性在计算上效率更高,且不会导致标准准确性大幅下降。 这些改进来自视觉变压器精巧处理大部分遮蔽图像的固有能力。 我们的代码可以在https://github.com/MadryLab/smooted-vit上查阅。