Discovering what is learned by neural networks remains a challenge. In self-supervised learning, classification is the most common task used to evaluate how good a representation is. However, relying only on such downstream task can limit our understanding of what information is retained in the representation of a given input. In this work, we showcase the use of a Representation Conditional Diffusion Model (RCDM) to visualize in data space the representations learned by self-supervised models. The use of RCDM is motivated by its ability to generate high-quality samples -- on par with state-of-the-art generative models -- while ensuring that the representations of those samples are faithful i.e. close to the one used for conditioning. By using RCDM to analyze self-supervised models, we are able to clearly show visually that i) SSL (backbone) representation are not invariant to the data augmentations they were trained with -- thus debunking an often restated but mistaken belief; ii) SSL post-projector embeddings appear indeed invariant to these data augmentation, along with many other data symmetries; iii) SSL representations appear more robust to small adversarial perturbation of their inputs than representations trained in a supervised manner; and iv) that SSL-trained representations exhibit an inherent structure that can be explored thanks to RCDM visualization and enables image manipulation.
翻译:在自我监督的学习中,分类是最常用的任务,用来评估代表性有多好。然而,仅仅依靠这种下游任务就限制了我们对特定投入中保留的信息的理解。在这项工作中,我们展示了使用代表条件扩散模型(RCDM)在数据空间中可视化自我监督模型所学到的表达方式。使用刚果民盟的动机是它能够生成高质量的样本 -- -- 与最先进的基因化模型相同 -- -- 同时确保这些样本的表达方式忠实于用于调节的样本。然而,仅仅依靠这种下游任务可能限制我们对特定投入中保留的信息的理解。在这项工作中,我们展示了使用代表条件扩散模型(RCDM)来在数据空间中可视化自我监督模型(RMSM)的表达方式。 因此,SLM(背骨)的表达方式不易变异,因此往往被反复确认,但错误的信仰;SLF后投影集的嵌入确实无法与这些数据的缩增扩增,同时确保这些样本的表达方式接近,即接近用于调节的样本。 通过使用刚果民盟的自我监督模型,我们能够清楚地显示,SLSL(SL)代表方式看起来更能。