Deep learning has achieved great success in recent years with the aid of advanced neural network structures and large-scale human-annotated datasets. However, it is often costly and difficult to accurately and efficiently annotate large-scale datasets, especially for some specialized domains where fine-grained labels are required. In this setting, coarse labels are much easier to acquire as they do not require expert knowledge. In this work, we propose a contrastive learning method, called $\textbf{Mask}$ed $\textbf{Con}$trastive learning~($\textbf{MaskCon}$) to address the under-explored problem setting, where we learn with a coarse-labelled dataset in order to address a finer labelling problem. More specifically, within the contrastive learning framework, for each sample our method generates soft-labels with the aid of coarse labels against other samples and another augmented view of the sample in question. By contrast to self-supervised contrastive learning where only the sample's augmentations are considered hard positives, and in supervised contrastive learning where only samples with the same coarse labels are considered hard positives, we propose soft labels based on sample distances, that are masked by the coarse labels. This allows us to utilize both inter-sample relations and coarse labels. We demonstrate that our method can obtain as special cases many existing state-of-the-art works and that it provides tighter bounds on the generalization error. Experimentally, our method achieves significant improvement over the current state-of-the-art in various datasets, including CIFAR10, CIFAR100, ImageNet-1K, Standford Online Products and Stanford Cars196 datasets. Code and annotations are available at https://github.com/MrChenFeng/MaskCon_CVPR2023.
翻译:深度学习近年来在先进的神经网络结构和大规模标注数据集的支持下取得了巨大的成功。然而,对于一些需要细粒度标注的专业领域来说,正确且高效地标注大规模数据集通常代价高昂且困难。在这种情况下,粗标签要容易得多,因为粗标签不需要专业知识。在本文中,我们提出了一种对比学习方法,称为“$\textbf{Mask}$ed $\textbf{Con}$trastive learning~($\textbf{MaskCon}$)”,从而解决了尚未开发充分的问题,即使用粗标注数据集进行学习,以解决更精细标注的问题。具体而言,在对比学习框架内,对于每个样本,我们的方法通过与其他样本和所查询样本的另一个增强视图相对比获得软标签。与自监督对比学习只考虑样本增强的硬正样本以及在有监督对比学习中只考虑具有相同粗标签的样本作为硬正样本相比,我们提出了基于样本间距离的软标签,以粗标签进行遮盖。这使我们能够利用样本之间的关系和粗标签。我们证明了我们的方法可以作为各种现有最先进方法的特殊情况,并且它提供了更紧的一般化误差界。在实验中,我们的方法在各种数据集上都显著改进,包括CIFAR10、CIFAR100、ImageNet-1K、Stanford Online Products和Stanford Cars196数据集。代码和注释可在https://github.com/MrChenFeng/MaskCon_CVPR2023获取。