As the complexity of machine learning (ML) models increases, resulting in a lack of prediction explainability, several methods have been developed to explain a model's behavior in terms of the training data points that most influence the model. However, these methods tend to mark outliers as highly influential points, limiting the insights that practitioners can draw from points that are not representative of the training data. In this work, we take a step towards finding influential training points that also represent the training data well. We first review methods for assigning importance scores to training points. Given importance scores, we propose a method to select a set of DIVerse INfluEntial (DIVINE) training points as a useful explanation of model behavior. As practitioners might not only be interested in finding data points influential with respect to model accuracy, but also with respect to other important metrics, we show how to evaluate training data points on the basis of group fairness. Our method can identify unfairness-inducing training points, which can be removed to improve fairness outcomes. Our quantitative experiments and user studies show that visualizing DIVINE points helps practitioners understand and explain model behavior better than earlier approaches.
翻译:随着机器学习(ML)模型的复杂性增加,导致缺乏预测的解释性,开发了几种方法来解释模型在影响模型的最主要培训数据点方面的行为,然而,这些方法往往将离线点标记为具有高度影响力的点,限制了从业人员从不代表培训数据的点上能够得出的洞察力。在这项工作中,我们迈出了一步,寻找具有影响力的培训点,这也代表了培训数据。我们首先审查将重要分数分配给培训点的方法。根据重要分数,我们提出了一套选择一套DIVERSe InfluEntial(DIVINE)培训点的方法,作为示范行为的有用解释。由于从业者可能不仅有兴趣找到对模型准确性有影响力的数据点,而且有兴趣找到其他重要指标,我们展示了如何根据群体公平性来评价培训点。我们的方法可以确定不公平的引导培训点,可以消除这些点,以提高公平性结果。我们的定量实验和用户研究表明,直观化DIVINE点有助于从业人员理解和解释模型行为比早先的方法更好。