Several explainable AI methods allow a Machine Learning user to get insights on the classification process of a black-box model in the form of local linear explanations. With such information, the user can judge which features are locally relevant for the classification outcome, and get an understanding of how the model reasons. Standard supervised learning processes are purely driven by the original features and target labels, without any feedback loop informed by the local relevance of the features identified by the post-hoc explanations. In this paper, we exploit this newly obtained information to design a feature engineering phase, where we combine explanations with feature values. To do so, we develop two different strategies, named Iterative Dataset Weighting and Targeted Replacement Values, which generate streamlined models that better mimic the explanation process presented to the user. We show how these streamlined models compare to the original black-box classifiers, in terms of accuracy and compactness of the newly produced explanations.
翻译:多种可解释的AI方法使机器学习用户能够以本地线性解释的形式了解黑盒模型的分类过程。 有了这些信息, 用户可以判断哪些特征与分类结果具有当地相关性, 并了解模型原因。 标准监督的学习过程纯粹受原始特征和目标标签驱动, 且没有任何反馈回路, 且没有根据热后解释所查明特征的当地相关性。 本文中, 我们利用这一新获得的信息设计一个特征工程阶段, 将解释与特征值结合起来。 为此, 我们制定了两种不同的战略, 名为“ 迭代数据集重置” 和“ 目标替代值 ”, 产生简化模型, 更好地模拟向用户介绍的解释过程。 我们展示这些简化模型如何与原始黑盒分类器相比, 准确和紧凑新制作的解释。