CENT: 用以比较分类者决定的以英文为基础的模型-不可知性解释框架 (CEnt: An Entropy-based Model-agnostic Explainability Framework to Contrast Classifiers' Decisions)

Current interpretability methods focus on explaining a particular model's decision through present input features. Such methods do not inform the user of the sufficient conditions that alter these decisions when they are not desirable. Contrastive explanations circumvent this problem by providing explanations of the form "If the feature $X>x$, the output $Y$ would be different''. While different approaches are developed to find contrasts; these methods do not all deal with mutability and attainability constraints. In this work, we present a novel approach to locally contrast the prediction of any classifier. Our Contrastive Entropy-based explanation method, CEnt, approximates a model locally by a decision tree to compute entropy information of different feature splits. A graph, G, is then built where contrast nodes are found through a one-to-many shortest path search. Contrastive examples are generated from the shortest path to reflect feature splits that alter model decisions while maintaining lower entropy. We perform local sampling on manifold-like distances computed by variational auto-encoders to reflect data density. CEnt is the first non-gradient-based contrastive method generating diverse counterfactuals that do not necessarily exist in the training data while satisfying immutability (ex. race) and semi-immutability (ex. age can only change in an increasing direction). Empirical evaluation on four real-world numerical datasets demonstrates the ability of CEnt in generating counterfactuals that achieve better proximity rates than existing methods without compromising latency, feasibility, and attainability. We further extend CEnt to imagery data to derive visually appealing and useful contrasts between class labels on MNIST and Fashion MNIST datasets. Finally, we show how CEnt can serve as a tool to detect vulnerabilities of textual classifiers.

翻译：当前解释方法侧重于通过当前输入功能解释特定模型的决定。这种方法并不告知用户在不适宜时改变这些决定的足够条件。反直径解释可以回避这一问题, 提供“ 如果特性$X>x$, 产出美元将有所不同” 的表格解释。虽然开发了不同的方法来查找对比; 这些方法并不都涉及变异性和可实现性限制。在这项工作中, 我们展示了一种新颖的方法来比较任何分类器的预测。我们的相对直径基于直径的解释方法, CEnt, 以决定树来比较一个模型, 来计算不同特性分解的变异性信息。一个图形, G, 然后构建一个通过一至多路径搜索路径找到对比节点的节点。从最短路径中生成了对比性示例分解, 从而在保持更低的音节奏的同时, 我们用本地的距离进行本地取样, 通过变异的自动分解来计算, 反映数据密度。 CEnt是第一个非直径直方向的模型, 以不易变的变现的变现性数据变现方法只能显示数字的变现数据变现。数据最后显示数据变变变变变变的变换的变换工具, 。显示数据变换的变换的变换的变换的变换的变换的变换的变换的变换的变换的变换的变变的变的变换的变换的变的变的变的变的变的变的变的变的变的变的变的变的变的变换的变换的变式的变换的变换的变式的变换的变换的变式数据工具, 。