通过计算复杂程度的镜头进行解释的模型 (Model Interpretability through the Lens of Computational Complexity)

from arxiv, 36 pages, including 9 pages of main text. This is the arXiv version of the NeurIPS'2020 paper. Except from minor differences that could be introduced by the publisher, the only difference should be the addition of the appendix, which contains all the proofs that do not appear in the main text

In spite of several claims stating that some models are more interpretable than others -- e.g., "linear models are more interpretable than deep neural networks" -- we still lack a principled notion of interpretability to formally compare among different classes of models. We make a step towards such a notion by studying whether folklore interpretability claims have a correlate in terms of computational complexity theory. We focus on local post-hoc explainability queries that, intuitively, attempt to answer why individual inputs are classified in a certain way by a given model. In a nutshell, we say that a class $\mathcal{C}_1$ of models is more interpretable than another class $\mathcal{C}_2$, if the computational complexity of answering post-hoc queries for models in $\mathcal{C}_2$ is higher than for those in $\mathcal{C}_1$. We prove that this notion provides a good theoretical counterpart to current beliefs on the interpretability of models; in particular, we show that under our definition and assuming standard complexity-theoretical assumptions (such as P$\neq$NP), both linear and tree-based models are strictly more interpretable than neural networks. Our complexity analysis, however, does not provide a clear-cut difference between linear and tree-based models, as we obtain different results depending on the particular post-hoc explanations considered. Finally, by applying a finer complexity analysis based on parameterized complexity, we are able to prove a theoretical result suggesting that shallow neural networks are more interpretable than deeper ones.

翻译：尽管有人声称有些模型比其他模型更易解释,例如“线性模型比深神经网络更易解释”,但我们仍然缺乏解释不同模型类别之间正式比较的有原则的可解释性概念。我们研究民俗解释性索赔在计算复杂性理论方面是否有关联性,以此朝这个概念迈出了一步。我们侧重于本地的事后解释性询问,直觉地试图解答为什么个别投入以某种方式被某个特定模型归类。简言之,我们说,一个等级($\mathal{C_1美元)的模型比另一个等级($\mathcal{C_2美元)更易解释性概念。如果对以美元计算复杂性计算的模型后热解要求的计算复杂性高于以美元计算的复杂性($mathcal{C_2美元)的可解释性询问。我们证明,这个概念为当前基于某种解释性模型的理念提供了良好的理论对应物;特别是,我们指出,根据我们的定义和假设标准的复杂度-理论性假设(例如,可以严格地解释性解释的网络)不是以直线性和直线式分析结果。