Rule set learning has long been studied and has recently been frequently revisited due to the need for interpretable models. Still, existing methods have several shortcomings: 1) most recent methods require a binary feature matrix as input, learning rules directly from numeric variables is understudied; 2) existing methods impose orders among rules, either explicitly or implicitly, which harms interpretability; and 3) currently no method exists for learning probabilistic rule sets for multi-class target variables (there is only a method for probabilistic rule lists). We propose TURS, for Truly Unordered Rule Sets, which addresses these shortcomings. We first formalise the problem of learning truly unordered rule sets. To resolve conflicts caused by overlapping rules, i.e., instances covered by multiple rules, we propose a novel approach that exploits the probabilistic properties of our rule sets. We next develop a two-phase heuristic algorithm that learns rule sets by carefully growing rules. An important innovation is that we use a surrogate score to take the global potential of the rule set into account when learning a local rule. Finally, we empirically demonstrate that, compared to non-probabilistic and (explicitly or implicitly) ordered state-of-the-art methods, our method learns rule sets that not only have better interpretability (i.e., they are smaller and truly unordered), but also better predictive performance.
翻译:长期以来,对规则的学习进行了研究,最近又由于需要可解释的模式而经常重新讨论。但现有方法有一些缺陷:(1) 最近的方法要求以二进制特征矩阵作为投入,直接从数字变量中学习规则,对此研究不足;(2) 现有方法在规则中明确或隐含地规定秩序,这有碍解释性;(3) 目前没有方法来学习多级目标变量的概率规则(只有概率规则列表的方法) 。我们建议TURS, 解决这些缺陷的真正的非秩序规则组。我们首先将学习真正非秩序规则组的问题正规化。为了解决由重叠规则引起的冲突,即由多重规则涵盖的情况,我们提出了一种新颖的方法,利用我们规则组的概率性特性;以及(3) 目前没有方法来学习多级目标变量组(只有概率规则列表)的两阶段超常性算法。 一个重要的创新是,我们在学习当地规则时,采用全球规则集成的潜力。最后,我们从经验上证明,与不精确性的方法相比,我们更精确地解释规则是没有更精确的。