Despite impressive progress in deep learning, generalizing far beyond the training distribution is an important open challenge. In this work, we consider few-shot classification, and aim to shed light on what makes some novel classes easier to learn than others, and what types of learned representations generalize better. To this end, we define a new paradigm in terms of attributes -- simple building blocks of which concepts are formed -- as a means of quantifying the degree of relatedness of different concepts. Our empirical analysis reveals that supervised learning generalizes poorly to new attributes, but a combination of self-supervised pretraining with supervised finetuning leads to stronger generalization. The benefit of self-supervised pretraining and supervised finetuning is further investigated through controlled experiments using random splits of the attribute space, and we find that predictability of test attributes provides an informative estimate of a model's generalization ability.
翻译:尽管在深层学习方面取得了令人瞩目的进展,但推广远远超过培训分布是一项重要的公开挑战。在这项工作中,我们考虑几小类分类,目的是说明是什么使一些新班比其他班更容易学习,以及哪些类学习的表述更为广泛。为此,我们从属性 -- -- 形成概念的简单构件 -- -- 的角度界定了一个新的范式,以量化不同概念的相关性程度。我们的经验分析显示,监督的学习向新的属性概括较差,但将自我监督的预科培训与监督下的微调相结合,导致更强烈的普及化。通过随机分割属性空间的受控实验,进一步调查了自我监督的预培训和受监督的微调的好处。我们发现,测试属性的可预测性为模型的普及能力提供了信息性估计。