Recent methods in self-supervised learning have demonstrated that masking-based pretext tasks extend beyond NLP, serving as useful pretraining objectives in computer vision. However, existing approaches apply random or ad hoc masking strategies that limit the difficulty of the reconstruction task and, consequently, the strength of the learnt representations. We improve upon current state-of-the-art work in learning adversarial masks by proposing a new framework that generates masks in a sequential fashion with different constraints on the adversary. This leads to improvements in performance on various downstream tasks, such as classification on ImageNet100, STL10, and CIFAR10/100 and segmentation on Pascal VOC. Our results further demonstrate the promising capabilities of masking-based approaches for SSL in computer vision.
翻译:最近自我监督的学习方法表明,以遮掩为借口的任务超越了NLP, 成为计算机视野的有用的培训前目标,但是,现有办法采用随机或临时的遮掩战略,限制重建任务的难度,从而限制所学的表述力量。我们改进了目前在学习对抗面具方面的最先进的工作,提出了一个新的框架,以相继方式产生面具,对对手有不同的限制。这导致各种下游任务的绩效的改进,例如图像Net100、STL10和CIFAR10/100的分类和Pascal VOC的分割。我们的结果进一步表明,在计算机视野中,以遮掩方式处理SSL问题很有希望的能力。