We introduce a simple and intuitive framework that provides quantitative explanations of statistical models through the probabilistic assessment of input feature importance. The core idea comes from utilizing the Dirichlet distribution to define the importance of input features and learning it via approximate Bayesian inference. The learned importance has probabilistic interpretation and provides the relative significance of each input feature to a model's output, additionally assessing confidence about its importance quantification. As a consequence of using the Dirichlet distribution over the explanations, we can define a closed-form divergence to gauge the similarity between learned importance under different models. We use this divergence to study the feature importance explainability tradeoffs with essential notions in modern machine learning, such as privacy and fairness. Furthermore, BIF can work on two levels: global explanation (feature importance across all data instances) and local explanation (individual feature importance for each data instance). We show the effectiveness of our method on a variety of synthetic and real datasets, taking into account both tabular and image datasets. The code is available at https://github.com/kamadforge/featimp_dp.
翻译:我们引入了一个简单和直观的框架,通过对投入重要性的概率评估,从数量上解释统计模型的重要性。核心思想来自利用Drichlet的分布来界定输入特征的重要性,并通过近似Bayesian推理来了解它的重要性。学到的重要性具有概率性解释,并提供了每个输入特征对模型输出的相对重要性,另外还评估了对模型重要性量化的信心。由于在解释中使用Drichlet的分布方法,我们可以确定一种封闭式的差异,以衡量在不同模型下所学到的重要性之间的相似性。我们利用这一差异来研究在现代机器学习中与隐私和公平等基本概念的可解释性权衡。此外,BIF可以在两个层面上开展工作:全球解释(在所有数据实例中都具有实用重要性)和当地解释(每个数据实例都具有个体特性重要性)。我们从表格和图像数据集两方面可以看出我们的各种合成和真实数据集方法的有效性。代码可在https://github.com/kamadforge/featimp_dp上查阅。