Recent advances in unsupervised representation learning have experienced remarkable progress, especially with the achievements of contrastive learning, which regards each image as well its augmentations as a separate class, while does not consider the semantic similarity among images. This paper proposes a new kind of data augmentation, named Center-wise Local Image Mixture, to expand the neighborhood space of an image. CLIM encourages both local similarity and global aggregation while pulling similar images. This is achieved by searching local similar samples of an image, and only selecting images that are closer to the corresponding cluster center, which we denote as center-wise local selection. As a result, similar representations are progressively approaching the clusters, while do not break the local similarity. Furthermore, image mixture is used as a smoothing regularization to avoid overconfidence on the selected samples. Besides, we introduce multi-resolution augmentation, which enables the representation to be scale invariant. Integrating the two augmentations produces better feature representation on several unsupervised benchmarks. Notably, we reach 75.5% top-1 accuracy with linear evaluation over ResNet-50, and 59.3% top-1 accuracy when fine-tuned with only 1% labels, as well as consistently outperforming supervised pretraining on several downstream transfer tasks.
翻译:在未经监督的演示学习方面最近的进展取得了显著的进展,特别是在未受监督的演示学习方面,取得了显著的进展,特别是对比性学习的成就,将每个图像及其增强作为单独的类别看待,而不考虑图像之间的语义相似性。本文件提出一种新的数据增强方法,名为中度-本地图像混合,以扩大图像的周边空间。CLIM鼓励地方相似性和全球聚合,同时拉动类似的图像。这是通过搜索当地类似的图像样本实现的,并且仅选择接近相应集群中心的图像,我们将此作为中度本地选择。因此,类似的图像正在逐渐接近集群,同时不打破本地相似性。此外,图像混合是一种平稳的正规化,以避免对选定样本的过度信任。此外,我们引入了多分辨率增强,这样可以使代表规模成为不易变的。将两个放大方法结合在几个不受监督的基准上产生更好的特征代表。值得注意的是,我们达到75.5%的上一级-1准确度是直线性评价超过ResNet-50,而59.3%的前一级-1准确度是精确度,同时进行精度的精确性调整,只监督下游前1。