DIVINE: 数据可视化和模型改进的多种深入培训点 (DIVINE: Diverse Influential Training Points for Data Visualization and Model Refinement)

As the complexity of machine learning (ML) models increases, resulting in a lack of prediction explainability, several methods have been developed to explain a model's behavior in terms of the training data points that most influence the model. However, these methods tend to mark outliers as highly influential points, limiting the insights that practitioners can draw from points that are not representative of the training data. In this work, we take a step towards finding influential training points that also represent the training data well. We first review methods for assigning importance scores to training points. Given importance scores, we propose a method to select a set of DIVerse INfluEntial (DIVINE) training points as a useful explanation of model behavior. As practitioners might not only be interested in finding data points influential with respect to model accuracy, but also with respect to other important metrics, we show how to evaluate training data points on the basis of group fairness. Our method can identify unfairness-inducing training points, which can be removed to improve fairness outcomes. Our quantitative experiments and user studies show that visualizing DIVINE points helps practitioners understand and explain model behavior better than earlier approaches.

翻译：随着机器学习(ML)模型的复杂性增加,导致缺乏预测的解释性,开发了几种方法来解释模型在影响模型的最主要培训数据点方面的行为,然而,这些方法往往将离线点标记为具有高度影响力的点,限制了从业人员从不代表培训数据的点上能够得出的洞察力。在这项工作中,我们迈出了一步,寻找具有影响力的培训点,这也代表了培训数据。我们首先审查将重要分数分配给培训点的方法。根据重要分数,我们提出了一套选择一套DIVERSe InfluEntial(DIVINE)培训点的方法,作为示范行为的有用解释。由于从业者可能不仅有兴趣找到对模型准确性有影响力的数据点,而且有兴趣找到其他重要指标,我们展示了如何根据群体公平性来评价培训点。我们的方法可以确定不公平的引导培训点,可以消除这些点,以提高公平性结果。我们的定量实验和用户研究表明,直观化DIVINE点有助于从业人员理解和解释模型行为比早先的方法更好。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【2021干货书】Python可解释人工智能，207页pdf，Explainable AI with Python

专知会员服务

186+阅读 · 2021年5月17日

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

应用机器学习书稿，361页pdf

专知会员服务

59+阅读 · 2020年11月24日