Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. In this work, we started from the original intent of concept-based interpretation and proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically evaluate how a small change of concept affecting the model's prediction, which leads to an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in both toy examples and real world datasets.
翻译:以概念为基础的对黑箱模型的解释往往更直观,人类可以理解。基于概念的解释最广泛采用的方法是“概念活性矢量”(CAV)。CAV依靠的是学习某一模式和概念的某些潜在代表之间的线性关系。线性分离通常被隐含地假定,但一般不成立。在这项工作中,我们从基于概念的解释的原始意图和拟议的“概念梯度”(CG)开始,将基于概念的解释扩大到线性概念功能之外。我们表明,对于一个一般性的(可能非线性)概念,我们可以从数学上评估影响模型预测的概念的微小变化如何影响模型的预测,从而导致基于梯度的解释扩展至概念空间。我们从经验上证明,CG在微小的例子和实际世界数据集中都比CAV优。