In this paper, we introduce a new approach, called "Posthoc Interpretation via Quantization (PIQ)", for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. The class-specific codebooks act as a bottleneck that forces the interpreter to focus on the parts of the input data deemed relevant by the classifier for making a prediction. We evaluated our method through quantitative and qualitative studies and found that PIQ generates interpretations that are more easily understood by participants to our user studies when compared to several other interpretation methods in the literature.
翻译:在本文中,我们引入了一种新的方法,称为“基于量化的后验解释(Posthoc Interpretation via Quantization, PIQ)”,用于解释经过训练的分类器所做出的决策。我们的方法利用向量量化将分类器的表示转换为离散的、针对特定类别的潜在空间。类别特定的码书作为一个瓶颈,迫使解释器集中精力关注分类器认为对预测有用的输入数据的部分。我们通过定量和定性研究评估了我们的方法,并发现PIQ生成的解释比文献中的其他几种解释方法更容易被参与我们用户研究的人所理解。