Visualization methods based on the nearest neighbor graph, such as t-SNE or UMAP, are widely used for visualizing high-dimensional data. Yet, these approaches only produce meaningful results if the nearest neighbors themselves are meaningful. For images represented in pixel space this is not the case, as distances in pixel space are often not capturing our sense of similarity and therefore neighbors are not semantically close. This problem can be circumvented by self-supervised approaches based on contrastive learning, such as SimCLR, relying on data augmentation to generate implicit neighbors, but these methods do not produce two-dimensional embeddings suitable for visualization. Here, we present a new method, called t-SimCNE, for unsupervised visualization of image data. T-SimCNE combines ideas from contrastive learning and neighbor embeddings, and trains a parametric mapping from the high-dimensional pixel space into two dimensions. We show that the resulting 2D embeddings achieve classification accuracy comparable to the state-of-the-art high-dimensional SimCLR representations, thus faithfully capturing semantic relationships. Using t-SimCNE, we obtain informative visualizations of the CIFAR-10 and CIFAR-100 datasets, showing rich cluster structure and highlighting artifacts and outliers.
翻译:基于近邻图,例如 t- SNE 或 UMAP 的可视化方法被广泛用于高维数据的视觉化。 然而,这些方法只有在近邻本身有意义的情况下才会产生有意义的结果。 对于像素空间中的图像来说,情况并非如此,因为像素空间的距离往往不能捕捉我们的相似感,因此邻居没有闭塞。基于对比性学习的自我监督方法,例如SimCLR,依靠数据增强产生隐性邻居,但是这些方法并不产生适合视觉化的二维嵌入。在这里,我们提出了一个新方法,称为T-SimCNE,用于图像数据不受监督的视觉化。T-SimCNE将对比性学习和邻居嵌入中的想法结合起来,并将高维像素空间的参数映射图分成两个维度。我们显示,由此形成的2D嵌入的分类精确度可与状态的高光度SimCLL演示相匹配,从而忠实地捕捉捉取了图像数据结构的SMARC 和10 IMS 的图像结构。