Rule set learning has long been studied and has recently been frequently revisited due to the need for interpretable models. Still, existing methods have several shortcomings: 1) most recent methods require a binary feature matrix as input, while learning rules directly from numeric variables is understudied; 2) existing methods impose orders among rules, either explicitly or implicitly, which harms interpretability; and 3) currently no method exists for learning probabilistic rule sets for multi-class target variables (there is only one for probabilistic rule lists). We propose TURS, for Truly Unordered Rule Sets, which addresses these shortcomings. We first formalize the problem of learning truly unordered rule sets. To resolve conflicts caused by overlapping rules, i.e., instances covered by multiple rules, we propose a novel approach that exploits the probabilistic properties of our rule sets. We next develop a two-phase heuristic algorithm that learns rule sets by carefully growing rules. An important innovation is that we use a surrogate score to take the global potential of the rule set into account when learning a local rule. Finally, we empirically demonstrate that, compared to non-probabilistic and (explicitly or implicitly) ordered state-of-the-art methods, our method learns rule sets that not only have better interpretability but also better predictive performance.
翻译:长期以来,对规则进行了研究,最近又由于需要可解释的模型而经常重新审查了规则的学习。然而,现有方法有一些缺陷:(1) 最近的方法要求将二进制特征矩阵作为投入,而直接从数字变量中学习规则则没有得到充分研究;(2) 现有方法在规则之间强加指令,无论是明示还是默示,这些指令都有害于解释;(3) 目前没有方法来学习多级目标变量的概率性规则(只有一种概率性规则列表); 我们建议TURS, 用于解决这些缺陷的真正的非有序规则集。 我们首先将学习真正非有序规则集的问题正式化。 为了解决由重叠规则引起的冲突,即由多重规则涵盖的情形,我们提出了一种新颖的方法,利用我们规则集的概率性特性;以及(3) 目前没有方法来学习多级目标变量(只有一种概率性规则列表)的两阶段超常性算法。 一个重要的创新是,我们在学习本地规则时,用一种替代的评分来考虑规则的全球潜力。最后,我们从经验上证明,与不精确地解释规则的方法相比,我们更准确地解释规则的稳定性和预测性方法。