关于SHAP-以核心为基础的解释的复杂性:通过知识汇编和不相迫性结果实现的易变性 (On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results)

from arxiv, 52 pages, including 48 pages of main text. This is an extended version of the AAAI conference paper "The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits" (arXiv:2007.14045), with additional results

In Machine Learning, the $\mathsf{SHAP}$-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the $\mathsf{SHAP}$-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). We also establish the computational limits of the SHAP-score by observing that computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider. It also implies that computing $\mathsf{SHAP}$-scores is intractable as well over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing $\mathsf{SHAP}$-scores over such class. In contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists for the computation of $\mathsf{SHAP}$-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula $\varphi$ and features $x,y$, whether the $\mathsf{SHAP}$-score of $x$ in $\varphi$ is smaller than the $\mathsf{SHAP}$-score of $y$ in $\varphi$.

翻译：在机器学习中, $\mathsf{ $HAP} $(美元=美元=美元=美元=SHAP) 的元数是用于解释特定实体的学习模型结果的Spley值的版本, 通过给每个特性分配一个分数。在一般计算 Shaply 值是一个棘手的问题, 我们证明一个非常积极的结果, $\mathsf{SHAP} $( 美元=美元=美元=SHAP) 的元数可以比确定和可调解的Boolean 电路来计算。这种电路在《知识汇编》领域研究, 将一系列的Boolean 电路和二进制决定图表用于解释。包括二进制决定树和定二进制的二进制分析图表( ObdDDDD) 的计算限制值。我们通过观察, 将SHP- 数的计算结果总是比该类的模型更难得多。这意味着, 确定性和解化是我们所考虑的电路段的基本特性。