We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely "responsibility scores". The approach and the programs can be applied with black-box models, and also with models that can be specified as logic programs, such as rule-based classifiers. The main focus of this work is on the specification and computation of "best" counterfactual entities, i.e. those that lead to maximum responsibility scores. From them one can read off the explanations as maximum responsibility feature values in the original entity. We also extend the programs to bring into the picture semantic or domain knowledge. We show how the approach could be extended by means of probabilistic methods, and how the underlying probability distributions could be modified through the use of constraints.
翻译:我们提议了用于确定和计算对作为分类模式投入的实体的反事实干预的回答设置程序。 关于模型的结果,由此产生的反事实实体可以作为定义和计算分类实体特征值基于因果关系的解释分数的基础,即“责任分数 ” 。 方法和程序可以与黑盒模型一起应用,也可以与可以作为逻辑程序指定的模型一起应用,如基于规则的分类师。 这项工作的主要重点是“最佳”反事实实体的规格和计算,即那些导致最大责任分数的实体。 从这些实体中,人们可以解读解释为原始实体的最大责任分数。 我们还扩展了程序,以将语义或域知识纳入到图示中。 我们展示了如何通过概率方法扩展该方法,以及如何通过使用制约来修改潜在概率分布。