We developed a novel SSL approach to capture global consistency and pixel-level local consistencies between differently augmented views of the same images to accommodate downstream discriminative and dense predictive tasks. We adopted the teacher-student architecture used in previous contrastive SSL methods. In our method, the global consistency is enforced by aggregating the compressed representations of augmented views of the same image. The pixel-level consistency is enforced by pursuing similar representations for the same pixel in differently augmented views. Importantly, we introduced an uncertainty-aware context stabilizer to adaptively preserve the context gap created by the two views from different augmentations. Moreover, we used Monte Carlo dropout in the stabilizer to measure uncertainty and adaptively balance the discrepancy between the representations of the same pixels in different views.
翻译:我们开发了一种新型的 SSL 方法,以捕捉全球一致性和像素水平的本地融合, 不同增强的相同图像观点之间有不同的增强度, 以适应下游的歧视性和密集的预测性任务。 我们采用了以往对比性 SSL 方法中使用的师生结构 。 在我们的方法中,全球一致性是通过将同一图像增强度观点的压缩表达方式集中起来来实施的。 像素水平的一致性是通过在不同增强的视角中为同一像素进行类似的表述来实施的。 重要的是,我们引入了一种不确定性环境稳定器,以适应性地维护不同增强度两种观点造成的背景差异。 此外,我们在稳定器中使用蒙特卡洛的辍学来测量不确定性,并适应性地平衡不同观点中相同像素的表述之间的差异。