Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using a two-stage regression technique borrowed from the instrumental variable literature. We also model the completeness of the concepts set and show that our debiasing method works when the concepts are not complete. Our synthetic and real-world experiments demonstrate the success of our method in removing biases and improving the ranking of the concepts in terms of their contribution to the explanation of the predictions.
翻译:以概念为基础的解释方法是一种流行的模型内存性工具,因为它说明了模型从对域专家有意义的概念方面作出预测的理由。在这项工作中,我们研究了概念与特征中混淆的信息相关联的问题。我们提出了一个新的因果前图表,用于模拟未观测到的变量的影响,并用从工具可变文献中借用的两阶段回归技术来消除混杂信息和噪音的影响。我们还对一套概念的完整性进行模拟,并表明在概念不完善时,我们的贬低方法是有效的。我们合成和现实世界的实验表明,我们的方法成功地消除了偏见,提高了概念在解释预测方面的贡献程度。