We propose Partially Interpretable Estimators (PIE) which attribute a prediction to individual features via an interpretable model, while a (possibly) small part of the PIE prediction is attributed to the interaction of features via a black-box model, with the goal to boost the predictive performance while maintaining interpretability. As such, the interpretable model captures the main contributions of features, and the black-box model attempts to complement the interpretable piece by capturing the "nuances" of feature interactions as a refinement. We design an iterative training algorithm to jointly train the two types of models. Experimental results show that PIE is highly competitive to black-box models while outperforming interpretable baselines. In addition, the understandability of PIE is comparable to simple linear models as validated via a human evaluation.
翻译:我们建议部分可解释性估算器(PIE)通过可解释性模型将预测归因于单个特征,而PIE预测的一小部分(可能)归因于通过黑盒模型进行的特征互动,目的是提高预测性能,同时保持可解释性。 因此,可解释性模型捕捉了特征的主要贡献,而黑盒模型试图通过捕捉特征互动的“特性”来补充可解释性碎片,作为改进。我们设计了一个迭代培训算法,以联合培训这两类模型。实验结果显示,PIE对黑盒模型的竞争力很高,但超过了可解释性基线。 此外,PIE的可理解性可以与通过人类评估验证的简单线性模型相比。