We argue that a valuable perspective on when a model learns \textit{good} representations is that inputs that are mapped to similar representations by the model should be perceived similarly by humans. We use \textit{representation inversion} to generate multiple inputs that map to the same model representation, then quantify the perceptual similarity of these inputs via human surveys. Our approach yields a measure of the extent to which a model is aligned with human perception. Using this measure of alignment, we evaluate models trained with various learning paradigms (\eg~supervised and self-supervised learning) and different training losses (standard and robust training). Our results suggest that the alignment of representations with human perception provides useful additional insights into the qualities of a model. For example, we find that alignment with human perception can be used as a measure of trust in a model's prediction on inputs where different models have conflicting outputs. We also find that various properties of a model like its architecture, training paradigm, training loss, and data augmentation play a significant role in learning representations that are aligned with human perception.
翻译:我们争论说,当模型学习\ textit{ good} 表示时,一个有价值的观点是,模型所映射到类似表述中的投入应该被人类类似地看待。我们使用\ textit{ 代表倒置} 来生成多个映射出相同模型的输入,然后通过人类调查量化这些输入的感知相似性。我们的方法可以衡量模型与人类感知一致的程度。我们利用这种一致性度量来评估经过各种学习范式(例如,受监督和自我监督的学习)和不同培训损失(标准和强力培训)所培训的模型。我们的结果表明,与人类感知的结合为模型的品质提供了有用的更多洞见。例如,我们发现,在模型预测不同模型产出相互冲突的投入时,与人类感知相一致可以作为一种信任的尺度。我们还发现,模型的各种特性,如其结构、培训范式、培训损失和数据增强等,在学习与人类感知觉相一致的表述中起着重要作用。