Despite the significant improvements that representation learning via self-supervision has led to when learning from unlabeled data, no methods exist that explain what influences the learned representation. We address this need through our proposed approach, RELAX, which is the first approach for attribution-based explanations of representations. Our approach can also model the uncertainty in its explanations, which is essential to produce trustworthy explanations. RELAX explains representations by measuring similarities in the representation space between an input and masked out versions of itself, providing intuitive explanations and significantly outperforming the gradient-based baseline. We provide theoretical interpretations of RELAX and conduct a novel analysis of feature extractors trained using supervised and unsupervised learning, providing insights into different learning strategies. Finally, we illustrate the usability of RELAX in multi-view clustering and highlight that incorporating uncertainty can be essential for providing low-complexity explanations, taking a crucial step towards explaining representations.
翻译:尽管通过自我监督学习代表方法在学习未贴标签的数据时已导致显著改进,但没有任何方法可以解释何以影响所学代表方法。我们通过我们拟议的方法(RELAX)解决了这一需要,这是基于归属解释陈述的第一个方法,我们的方法也可以模拟其解释的不确定性,这对于提供可信的解释至关重要。RELAX通过测量输入和隐藏的自我版本之间的代表空间的相似性,提供直观的解释,大大优于梯度基准。我们对REAX提供理论解释,并对利用监督和不受监督的学习培训的特征提取器进行新颖的分析,为不同的学习战略提供洞察力。最后,我们说明REAX在多视角组合中的可用性,并强调将不确定性纳入对于提供低兼容性解释至关重要,在解释陈述方面迈出关键一步。