Concept Bottleneck Models (CBMs) first map raw input(s) to a vector of human-defined concepts, before using this vector to predict a final classification. We might therefore expect CBMs capable of predicting concepts based on distinct regions of an input. In doing so, this would support human interpretation when generating explanations of the model's outputs to visualise input features corresponding to concepts. The contribution of this paper is threefold: Firstly, we expand on existing literature by looking at relevance both from the input to the concept vector, confirming that relevance is distributed among the input features, and from the concept vector to the final classification where, for the most part, the final classification is made using concepts predicted as present. Secondly, we report a quantitative evaluation to measure the distance between the maximum input feature relevance and the ground truth location; we perform this with the techniques, Layer-wise Relevance Propagation (LRP), Integrated Gradients (IG) and a baseline gradient approach, finding LRP has a lower average distance than IG. Thirdly, we propose using the proportion of relevance as a measurement for explaining concept importance.
翻译:概念瓶颈模型(BBS)首先绘制对人类定义概念矢量的原始输入,然后使用该矢量预测最终分类。因此,我们可能期望建立信任措施能够预测基于输入的不同区域的概念。这样做,将有助于在解释模型产出时对模型产出作出解释,使与概念相对应的输入特征具有可视化性。本文的贡献有三重:首先,我们扩大现有文献的范围,从输入到概念矢量的关联性,确认相关性在输入特性之间分布,从概念矢量到最终分类,而概念矢量则大多使用目前预测的概念进行最后分类。第二,我们报告定量评估,以衡量最大输入特征相关性与地面真相位置之间的距离;我们用技术,即图层相关联性推进法、综合梯度法和基线梯度法,发现LRP的平均距离低于IG。 第三,我们建议使用相关性比例作为解释概念重要性的衡量标准。