Unsupervised black-box models are challenging to interpret. Indeed, most existing explainability methods require labels to select which component(s) of the black-box's output to interpret. In the absence of labels, black-box outputs often are representation vectors whose components do not correspond to any meaningful quantity. Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem. To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time. We demonstrate that our extensions can be successfully implemented as simple wrappers around many existing feature and example importance methods. We illustrate the utility of our label-free explainability paradigm through a qualitative and quantitative comparison of representation spaces learned by various autoencoders trained on distinct unsupervised tasks.
翻译:无监督的黑盒模型很难解释。 事实上,大多数现有解释方法都要求标签选择黑盒输出要解释的哪个组成部分。 在没有标签的情况下,黑盒输出往往是代表矢量,其组成部分与任何有意义的数量不符。 因此,选择哪些组成部分在无标签、无监督/自监督的环境中解释是一个重要但尚未解决的问题。为了弥合文献中的这一差距,我们引入了两个关键后热盒解释技术的扩展:(1) 无标签特性重要性和(2) 无标签示例重要性,分别突出具有影响力的特征和黑盒在推论时间构建代表的训练范例。我们证明,我们的扩展可以成功地作为简单的包件,围绕许多现有特点和示例重要性方法来实施。我们通过对接受过特殊非监督任务培训的自动操作者所学的代表性空间进行定性和定量比较来说明我们的无标签解释范式的效用。