Machine-learning models are ubiquitous. In some domains, for instance, in medicine, the models' predictions must be interpretable. Decision trees, classification rules, and subgroup discovery are three broad categories of supervised machine-learning models presenting knowledge in the form of interpretable rules. The accuracy of these models learned from small datasets is usually low. Obtaining larger datasets is often hard to impossible. Pedagogical rule extraction methods could help to learn better rules from small data by augmenting a dataset employing statistical models and using it to learn a rule-based model. However, existing evaluation of these methods is often inconclusive, and they were not compared so far. Our framework PRELIM unifies existing pedagogical rule extraction techniques. In the extensive experiments, we identified promising PRELIM configurations not studied before.
翻译:机械学习模式无处不在,在某些领域,例如医学领域,模型的预测必须是可解释的。决策树、分类规则和分组发现是三大类监督的机械学习模式,以可解释规则的形式显示知识。从小数据集中学习的这些模型的准确性通常较低。获取更大的数据集往往难以实现。教学规则的提取方法可以通过利用统计模型来增加数据集,并利用它学习基于规则的模式,帮助从小数据中学习更好的规则。然而,目前对这些方法的评估往往没有结论,而且迄今没有进行比较。我们的框架PRELIM将现有的教学规则提取技术统一起来。在广泛的实验中,我们确定了以前没有研究过的有前途的PRELIM配置。