Interpretability is having an increasingly important role in the design of machine learning algorithms. However, interpretable methods tend to be less accurate than their black-box counterparts. Among others, DNFs (Disjunctive Normal Forms) are arguably the most interpretable way to express a set of rules. In this paper, we propose an effective bottom-up extension of the popular FIND-S algorithm to learn DNF-type rulesets. The algorithm greedily finds a partition of the positive examples. The produced DNF is a set of conjunctive rules, each corresponding to the most specific rule consistent with a part of positive and all negative examples. We also propose two principled extensions of this method, approximating the Bayes Optimal Classifier by aggregating DNF decision rules. Finally, we provide a methodology to significantly improve the explainability of the learned rules while retaining their generalization capabilities. An extensive comparison with state-of-the-art symbolic and statistical methods on several benchmark data sets shows that our proposal provides an excellent balance between explainability and accuracy.
翻译:解释性在机器学习算法的设计中正在发挥越来越重要的作用。然而,可解释性方法往往不如黑盒对应法准确。 除其他外, DNF(不同常态形式)可以说是表达一套规则的最容易解释的方法。 在本文件中,我们建议对流行的FD-S算法进行有效的自下而上扩展,以学习DNF-S型规则。 算法贪婪地找到正面例子的分割。 产生的DNF是一套配套规则,每个规则都与最具体的规则相对应,与正反例子的一部分一致。 我们还提议了这种方法的两个原则扩展,即通过合并DNF决定规则来接近Bayes最优化分类。 最后,我们提供了一种方法,在保留其通用能力的同时,大大改进所学规则的解释性。 与几个基准数据集中的最新符号和统计方法的广泛比较表明,我们的提案在解释性和准确性之间提供了极好的平衡。