Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
翻译:蒙面图像建模(MIM)显示了自我监督学习的巨大希望,但因学习效率低下而遭到批评。我们认为,培训信号的利用不足应该对此负责。为了缓解这一问题,我们引入了一个概念简单但学习效率高的MIM培训计划,称为“与联合蒸馏(DMJD)脱节遮罩(DMJD)脱节遮罩(DMD) 。对于脱节掩罩(DM),我们相继在小型图像中抽取多种遮面观点,与互不相干监管(SSL)一起提高每个图像重建标本的使用率,同时保持每个视图的遮盖率。为了联合蒸馏(JD),我们采用了双轨结构,分别预测隐形(面)和可见(无面)的(无面)MIMMM(DM)标志,在提高培训效率、DM(DM)和JJ(DM)之间,在不牺牲模型的概括化能力的情况下,通过有效的培训(3.7倍于时间消耗)一半来报告竞争性的绩效。通过JDMDMJ的模型,我们在SALDD(SARD)的升级检测(SD)和SIM(SD)的升级(SOMD)系统(SD)的升级(SOMD)任务)的升级(SD(SD)和升级)的升级(SD)的分类,在SD(SD)中,在SUD)的升级(SD)的升级(SD)的升级(SD)的分类中,在常规的)的升级的升级的分类中,在SDMDMDM(SDR)中,在SDMDMDMDR)中将SDR-se-se-se-se-se-se-deal-deal-se-se-se-deal-se-se-deal-debal-deal-deal-deal-dealdaldal-deal-deal-de-d-d-deal-decal-de-de)的分类中,在SL)。