In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images. In DMAE, we corrupt each image by adding Gaussian noises to each pixel value and randomly masking several patches. A Transformer-based encoder-decoder model is then trained to reconstruct the original image from the corrupted one. In this learning paradigm, the encoder will learn to capture relevant semantics for the downstream tasks, which is also robust to Gaussian additive noises. We show that the pre-trained encoder can naturally be used as the base classifier in Gaussian smoothed models, where we can analytically compute the certified radius for any data point. Although the proposed method is simple, it yields significant performance improvement in downstream classification tasks. We show that the DMAE ViT-Base model, which just uses 1/10 parameters of the model developed in recent work arXiv:2206.10550, achieves competitive or better certified accuracy in various settings. The DMAE ViT-Large model significantly surpasses all previous results, establishing a new state-of-the-art on ImageNet dataset. We further demonstrate that the pre-trained model has good transferability to the CIFAR-10 dataset, suggesting its wide adaptability. Models and code are available at https://github.com/quanlin-wu/dmae.
翻译:在本文中, 我们提出一种新的自我监督方法, 叫做 Denoising 蒙面自动编码器( DMAE ), 用于学习经认证的稳健图像分类。 在 DMAE 中, 我们通过在每个像素值中添加高萨噪音并随机遮盖多个补丁来腐蚀每张图像。 然后对一个基于变换器的编码器- 解码器模型进行训练, 以重建从被腐蚀的图像中产生的原始图像。 在这个学习模式中, 编码器将学会为下游任务获取相关的语义, 下游任务对高斯添加添加的噪音也是很强的。 我们显示, 预训练的编码器自然可以用作高斯平滑的模型的基级分类器。 我们可以通过分析来计算任何数据点的认证半径。 虽然拟议的方法很简单, 但下游的分类任务能带来显著的性能改进。 我们显示, DMAE Vit- base 模型, 仅仅使用最近工作 ARXiv: 2206. 10550, 在各种设置的模型中实现竞争性或更好的认证精确性, 在不同的环境中, 显示其先前的数据- 数据- IMAMASet- true- tri- true a pre s preget pre pre pre prelate- truemental</s>