Deep neural networks are capable of learning powerful representations to tackle complex vision tasks but expose undesirable properties like the over-fitting issue. To this end, regularization techniques like image augmentation are necessary for deep neural networks to generalize well. Nevertheless, most prevalent image augmentation recipes confine themselves to off-the-shelf linear transformations like scale, flip, and colorjitter. Due to their hand-crafted property, these augmentations are insufficient to generate truly hard augmented examples. In this paper, we propose a novel perspective of augmentation to regularize the training process. Inspired by the recent success of applying masked image modeling to self-supervised learning, we adopt the self-supervised masked autoencoder to generate the distorted view of the input images. We show that utilizing such model-based nonlinear transformation as data augmentation can improve high-level recognition tasks. We term the proposed method as \textbf{M}ask-\textbf{R}econstruct \textbf{A}ugmentation (MRA). The extensive experiments on various image classification benchmarks verify the effectiveness of the proposed augmentation. Specifically, MRA consistently enhances the performance on supervised, semi-supervised as well as few-shot classification. The code will be available at \url{https://github.com/haohang96/MRA}.
翻译:深心神经网络能够学习强大的表达方式来完成复杂的视觉任务,但暴露出不受欢迎的特性,比如过于适合的问题。 为此,深心神经网络需要像图像增强这样的正规化技术来进行全面推广。 然而,最普遍的图像增强配方仅限于超现成线性变异,如规模、翻转和色彩活化。由于其手工制作的属性,这些增强功能不足以产生真正硬的增强示例。在本文件中,我们提出了一个扩大的新视角来规范培训进程。由于最近成功地将蒙面图像建模应用到自我监督的学习中,我们采用了自我监督的蒙面自动编码来生成对输入图像的扭曲视图。我们表明,使用基于模型的非线性变异(如数据增强)可以改进高层次的识别任务。我们将这些拟议方法称为\ textbf{M}M}sk- textbf{R}econstrucstructurct\ textff{A} roductionation(MRA) 将广泛实验各种图像分类基准来验证拟议升级/OUDRUDR} 的效用。 具体地,MA 将不断加强业绩。