Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.
翻译:集成已被证明是提高模型性能、不确定性估计和鲁棒性的强大技术。自监督学习 (SSL) 的进步使得能够利用大规模未标记的语料库,实现最先进的少样本和监督学习性能。本文探讨如何通过开发一种允许数据相关加权交叉熵损失函数的框架来改进最近的 SSL 技术。我们不对表示骨干进行集成;这种选择产生了一种高效的集成方法,它产生小的训练成本,不需要对下游评估进行架构更改或计算开销。我们的方法的有效性通过两种最先进的 SSL 方法 DINO (Caron et al.,2021) 和 MSN (Assran et al.,2022) 得以证明。我们的方法在 ImageNet-1K 上的多种评估指标中优于两个方法,尤其是在少样本设置下。我们探究了多种加权方案,并发现增加集成头的多样性会导致更好的下游评估结果。全面的实验表明了改进的先前技术基线,我们的方法仍然超越了其性能;例如,对于 MSN ViT-B/16 的 1-shot 学习,我们的整体改进为 3.9 p.p。