Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.
翻译:自我监督学习(SSL)的进展使得能够利用大型未贴标签的社团进行最先进的微粒和受监督的学习表现。在本文件中,我们探讨了共同方法如何能够通过开发一个允许基于数据的加权跨热带损失的框架来改进最新的SSL技术。我们避免将代表性骨干混为一体;这一选择产生了一种高效的混合方法,产生少量的培训费用,不需要下游评估的建筑改变或计算间接费用。我们的方法的效力表现在两种最先进的SSL方法,即DINO(Caron等人,2021年)和MSN(Asran等人,2022年)。我们的方法在图像网-1K的多个评价指标方面都优于我们,特别是在几张照片的设置方面。我们探索了几种加权办法,并发现那些增加组合头多样性的方案导致更好的下游评价结果。索罗夫试验(Caron等人等人等人,2021年)和MSN(Asran等人,2022年),以及我们的MSN16号实验在总体艺术基线方面都取得了改进。