Learning transferable representations by training a classifier is a well-established technique in deep learning (e.g., ImageNet pretraining), but it remains an open theoretical question why this kind of task-specific pre-training should result in ''good'' representations that actually capture the underlying structure of the data. We conduct an information-theoretic analysis of several commonly-used supervision signals from contrastive learning and classification to determine how they contribute to representation learning performance and how the dynamics of learning are affected by training parameters such as the number of labels, classes, and dimensions in the training dataset. We validate these results empirically in a series of simulations and conduct a cost-benefit analysis to establish a tradeoff curve that enables users to optimize the cost of supervising representation learning on their own datasets.
翻译:通过培训分类员进行学习可转移表述是深层学习的既定技术(如图像网络预培训),但它仍然是一个开放的理论问题,为什么这种具体任务的培训前的“良好”表述应导致“良好”表述,从而实际反映数据的基本结构。我们对来自对比学习和分类的若干常用监督信号进行信息理论分析,以确定它们如何促进学习业绩,以及学习动态如何受到诸如标签、班级和培训数据集的方方面面等培训参数的影响。我们在一系列模拟中以经验方式验证这些结果,并进行成本效益分析,以建立一个折中曲线,使用户能够优化监督自己数据集中代表学习的成本。