Certified defense using randomized smoothing is a popular technique to provide robustness guarantees for deep neural networks against l2 adversarial attacks. Existing works use this technique to provably secure a pretrained non-robust model by training a custom denoiser network on entire training data. However, access to the training set may be restricted to a handful of data samples due to constraints such as high transmission cost and the proprietary nature of the data. Thus, we formulate a novel problem of "how to certify the robustness of pretrained models using only a few training samples". We observe that training the custom denoiser directly using the existing techniques on limited samples yields poor certification. To overcome this, our proposed approach (DE-CROP) generates class-boundary and interpolated samples corresponding to each training sample, ensuring high diversity in the feature space of the pretrained classifier. We train the denoiser by maximizing the similarity between the denoised output of the generated sample and the original training sample in the classifier's logit space. We also perform distribution level matching using domain discriminator and maximum mean discrepancy that yields further benefit. In white box setup, we obtain significant improvements over the baseline on multiple benchmark datasets and also report similar performance under the challenging black box setup.
翻译:使用随机滑动的认证性防护使用随机滑动,是一种为深神经网络提供抗I2对抗性攻击的稳健性保障的流行技术。现有的工程使用这一技术,通过对整个培训数据进行自定义脱压器网络的培训,确保事先训练的非紫外线模型的可靠。然而,由于传输成本高和数据专有性质等限制,进入培训成套系统可能限于少数数据样本。因此,我们提出了一个新问题,即“仅使用少量培训样本,如何证明预先训练的模型的稳健性”。我们观察到,直接使用有限样品的现有技术培训自定义调音器,使现有技术获得的认证差。为克服这一技术,我们提议的办法(DE-CROP)生成了与每个培训样本相应的等级界限和相互交错的样本,确保事先训练的叙级器的特征空间具有高度多样性。我们通过尽可能扩大所生成的样品的去notical输出与分类器的原始培训样本之间的相似性。我们还观察到,使用域歧视器和最大平均值差异,从而产生进一步的好处。为了克服这一点,我们拟议的标准箱下设定了重大的改进。