Interpretability is becoming increasingly important for predictive model analysis. Unfortunately, as remarked by many authors, there is still no consensus regarding this notion. The goal of this paper is to propose the definition of a score that allows to quickly compare interpretable algorithms. This definition consists of three terms, each one being quantitatively measured with a simple formula: predictivity, stability and simplicity. While predictivity has been extensively studied to measure the accuracy of predictive algorithms, stability is based on the Dice-Sorensen index for comparing two rule sets generated by an algorithm using two independent samples. The simplicity is based on the sum of the lengths of the rules derived from the predictive model. The proposed score is a weighted sum of the three terms mentioned above. We use this score to compare the interpretability of a set of rule-based algorithms and tree-based algorithms for the regression case and for the classification case.
翻译:对预测模型分析而言,解释性正变得越来越重要。不幸的是,正如许多作者所说,对于这个概念,还没有达成共识。本文件的目的是提出能够快速比较可解释算法的得分定义。这一定义包括三个术语,每个术语都用简单的公式进行定量衡量:预测性、稳定性和简单性。虽然对预测算法的准确性进行了广泛研究,但预测性以Dice-Sorensen指数为基础,用以比较使用两个独立样本的算法产生的两套规则。简单性基于预测模型得出的规则长度的总和。提议的得分是上述三个术语的加权总和。我们用这一评分来比较一套基于规则的算法和基于树的算法对回归案例和分类案例的解释性。