We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.
翻译:我们研究一个解释系统对基本预测模型的忠实性,我们证明这可以通过两种特性、一致性和充足性来体现,并引入衡量这些特性的量化尺度。有趣的是,这些措施取决于测试时间数据分布。对于现有的各种解释系统,例如锚,我们分析研究这些数量。我们还提供估计和样本复杂性界限,以便根据经验确定黑盒解释系统的忠实性。最后,我们实验验证了新的属性和估计者。