Random Forest (RFs) are among the most widely used Machine Learning (ML) classifiers. Even though RFs are not interpretable, there are no dedicated non-heuristic approaches for computing explanations of RFs. Moreover, there is recent work on polynomial algorithms for explaining ML models, including naive Bayes classifiers. Hence, one question is whether finding explanations of RFs can be solved in polynomial time. This paper answers this question negatively, by proving that computing one PI-explanation of an RF is D^P-complete. Furthermore, the paper proposes a propositional encoding for computing explanations of RFs, thus enabling finding PI-explanations with a SAT solver. This contrasts with earlier work on explaining boosted trees (BTs) and neural networks (NNs), which requires encodings based on SMT/MILP. Experimental results, obtained on a wide range of publicly available datasets, demontrate that the proposed SAT-based approach scales to RFs of sizes common in practical applications. Perhaps more importantly, the experimental results demonstrate that, for the vast majority of examples considered, the SAT-based approach proposed in this paper significantly outperforms existing heuristic approaches.
翻译:随机森林(RFs)是最广泛使用的机械学习(ML)分类方法之一。 即使RFs不能解释, 也没有专门的非重型方法来计算对RFs的解释。 此外, 最近还就解释ML模型的多元数学算法进行了工作, 包括幼稚的Bayes分类器。 因此, 一个问题是, 找到RF的解释能否在多元时间内解决。 本文通过证明对RF进行一个 PI 解析( ML) 的方法是完全的, 来否定这个问题。 此外, 本文还提出了计算RFs解释的理论编码, 从而能够找到与SAT 解析器的 PI Explations 。 这与早先解释增殖树(BTs) 和神经网络(NNSs) 的工作形成对照, 前者要求根据SMT/MILP 进行编码。 实验结果来自广泛的公开数据集, 其提议基于SAT 方法的大小比对实际应用中常见的RFs。 更重要的是, 实验结果表明, 实验结果显示他提出的大多数论文中的现有模型都显示。