使用 Entropic 变量预测解释机器学习模型 (Explaining Machine Learning Models using Entropic Variable Projection)

In this paper, we present a new explainability formalism designed to explain how each input variable of a test set impacts the predictions of machine learning models. Hence, we propose a group explainability formalism for trained machine learning decision rules, based on their response to the variability of the input variables distribution. In order to emphasize the impact of each input variable, this formalism uses an information theory framework that quantifies the influence of all input-output observations based on entropic projections. This is thus the first unified and model agnostic formalism enabling data scientists to interpret the dependence between the input variables, their impact on the prediction errors, and their influence on the output predictions. Convergence rates of the entropic projections are provided in the large sample case. Most importantly, we prove that computing an explanation in our framework has a low algorithmic complexity, making it scalable to real-life large datasets. We illustrate our strategy by explaining complex decision rules learned by using XGBoost, Random Forest or Deep Neural Network classifiers on various datasets such as Adult income, MNIST and CelebA. We finally make clear its differences with the explainability strategies \textit{LIME} and \textit{SHAP}, that are based on single observations. Results can be reproduced by using the freely distributed Python toolbox https://gems-ai.com}.

翻译：在本文中,我们提出了一种新的解释性形式主义,旨在解释测试组的每个输入变量如何影响机器学习模型的预测。因此,我们建议根据对输入变量分布的变异性的反应,为经过培训的机器学习决策规则提出一个集体解释性形式主义。为了强调每个输入变量的影响,这种形式主义使用一个信息理论框架,以量化基于预测的输入输出观测的影响。因此,这是第一个统一和模型性不可知的正式主义,使数据科学家能够解释输入变量之间的依赖性、它们对预测错误的影响以及它们对产出预测的影响。在大样本中提供了经过培训的机器学习决策规则的一致率。最重要的是,我们证明在我们框架中计算解释的解释是低算法复杂性的,使得它能够与真实的大型数据集进行缩放。我们通过解释复杂的决策规则,通过使用 XGBoost、随机森林或深神经网络的分类,使数据科学家能够解释输入变量,例如成人收入、MNIST和CeebA。我们最后用其宏预测的一致率率率比率来澄清其最小性战略。我们通过工具来解释其复制的结果。

相关内容

Machine Learning

关注 2240

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

【开放书】预测模型:探索、解释和调试，以人为本的可解释机器学习，Predictive Models: Explore, Explain, and Debug，Human-Centered Interpretable Machine Learning

专知会员服务

37+阅读 · 2019年12月26日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日