Recent work has compared neural network representations via similarity-based analyses, shedding light on how different aspects (architecture, training data, etc.) affect models' internal representations. The quality of a similarity measure is typically evaluated by its success in assigning a high score to representations that are expected to be matched. However, existing similarity measures perform mediocrely on standard benchmarks. In this work, we develop a new similarity measure, dubbed ContraSim, based on contrastive learning. In contrast to common closed-form similarity measures, ContraSim learns a parameterized measure by using both similar and dissimilar examples. We perform an extensive experimental evaluation of our method, with both language and vision models, on the standard layer prediction benchmark and two new benchmarks that we introduce: the multilingual benchmark and the image-caption benchmark. In all cases, ContraSim achieves much higher accuracy than previous similarity measures, even when presented with challenging examples, and reveals new insights not captured by previous measures.
翻译:最近的研究通过基于相似度的分析比较了神经网络表示法,揭示了不同方面(体系结构、训练数据等)如何影响模型的内部表示。相似性度量的质量通常通过其在分配应匹配的表示时的成功来评估。然而,现有的相似性度量在标准基准测试中表现平平。在本研究中,我们基于对比学习开发了一种新的相似性度量,称为ContraSim。与常见的闭式相似性度量不同,ContraSim通过使用相似和不相似的示例学习参数化的度量。我们对我们的方法进行了广泛的实验评估,包括语言和视觉模型,对标准的层预测基准测试和我们介绍的两个新基准测试进行了评估:多语言基准测试和图像字幕基准测试。在所有情况下,ContraSim的准确率都远高于先前的相似性度量,即使在出现挑战性的例子时也是如此,并揭示了之前未捕捉到的新见解。