What role do augmentations play in contrastive learning? Recent work suggests that good augmentations are label-preserving with respect to a specific downstream task. We complicate this picture by showing that label-destroying augmentations can be useful in the foundation model setting, where the goal is to learn diverse, general-purpose representations for multiple downstream tasks. We perform contrastive learning experiments on a range of image and audio datasets with multiple downstream tasks (e.g. for digits superimposed on photographs, predicting the class of one vs. the other). We find that Viewmaker Networks, a recently proposed model for learning augmentations for contrastive learning, produce label-destroying augmentations that stochastically destroy features needed for different downstream tasks. These augmentations are interpretable (e.g. altering shapes, digits, or letters added to images) and surprisingly often result in better performance compared to expert-designed augmentations, despite not preserving label information. To support our empirical results, we theoretically analyze a simple contrastive learning setting with a linear model. In this setting, label-destroying augmentations are crucial for preventing one set of features from suppressing the learning of features useful for another downstream task. Our results highlight the need for analyzing the interaction between multiple downstream tasks when trying to explain the success of foundation models.
翻译:在对比性学习中,增强能起到什么作用?最近的工作表明,良好的增强能对具体的下游任务进行标签保护。我们通过显示标签破坏性增强在基础模型设置中可能有用,以学习多个下游任务所需的多种通用代表形式。我们在一系列图像和音频数据集上进行对比性学习实验,这些图像和音频数据集具有多个下游任务(例如,照片上叠加的数字,预测一个对另一个的等级)。我们发现,显示器网络,即最近提出的学习对比性学习增强的模型,产生标签破坏性增强,可以直接破坏不同下游任务所需的特征。这些增强是可解释的(例如,改变形状、数字或添加到图像的字母),而且令人惊讶的是,尽管没有保存标签信息,但与专家设计的增强功能相比,我们往往表现得更好。为了支持我们的经验结果,我们从理论上用线性模型分析一个简单的对比性学习设置。在这个设置中,另一个标签破坏性增强性增强功能对于防止在下游任务中进行多套式分析任务时的特征变化至关重要。