Understanding the behavior of learned classifiers is an important task, and various black-box explanations, logical reasoning approaches, and model-specific methods have been proposed. In this paper, we introduce probabilistic sufficient explanations, which formulate explaining an instance of classification as choosing the "simplest" subset of features such that only observing those features is "sufficient" to explain the classification. That is, sufficient to give us strong probabilistic guarantees that the model will behave similarly when all features are observed under the data distribution. In addition, we leverage tractable probabilistic reasoning tools such as probabilistic circuits and expected predictions to design a scalable algorithm for finding the desired explanations while keeping the guarantees intact. Our experiments demonstrate the effectiveness of our algorithm in finding sufficient explanations, and showcase its advantages compared to Anchors and logical explanations.
翻译:了解有学识的分类师的行为是一项重要任务,并提出了各种黑匣子解释、逻辑推理方法和具体模型方法。在本文中,我们引入了概率充分的解释,以解释分类实例,选择“简单”的特征子集,这样只观察这些特征就“足以”解释分类。这足以给我们强有力的概率保障,保证模型在数据分布下观察所有特征时将采取相似的行为。此外,我们利用概率性电路和预期预测等可移动的概率推理工具设计一个可伸缩的算法,以寻找理想的解释,同时保持保证完整。我们的实验展示了我们的算法在寻找充分解释方面的有效性,并展示了它与锚和逻辑解释相比的优势。