We describe how answer-set programs can be used to declaratively specify counterfactual interventions on entities under classification, and reason about them. In particular, they can be used to define and compute responsibility scores as attribution-based explanations for outcomes from classification models. The approach allows for the inclusion of domain knowledge and supports query answering. A detailed example with a naive-Bayes classifier is presented.
翻译:我们描述了如何利用答案设置程序来对分类实体作出声明性地具体说明反事实干预,并说明其理由。特别是,它们可以用来界定和计算责任分数,作为分类模型结果的归因解释。 这种方法允许纳入域知识并支持问答。 提供了一个与天真的Bayes分类器有关的详细例子。