Machine learning has become omnipresent with applications in various safety-critical domains such as medical, law, and transportation. In these domains, high-stake decisions provided by machine learning necessitate researchers to design interpretable models, where the prediction is understandable to a human. In interpretable machine learning, rule-based classifiers are particularly effective in representing the decision boundary through a set of rules comprising input features. The interpretability of rule-based classifiers is in general related to the size of the rules, where smaller rules are considered more interpretable. To learn such a classifier, the brute-force direct approach is to consider an optimization problem that tries to learn the smallest classification rule that has close to maximum accuracy. This optimization problem is computationally intractable due to its combinatorial nature and thus, the problem is not scalable in large datasets. To this end, in this paper we study the triangular relationship among the accuracy, interpretability, and scalability of learning rule-based classifiers. The contribution of this paper is an interpretable learning framework IMLI, that is based on maximum satisfiability (MaxSAT) for synthesizing classification rules expressible in proposition logic. Despite the progress of MaxSAT solving in the last decade, the straightforward MaxSAT-based solution cannot scale. Therefore, we incorporate an efficient incremental learning technique inside the MaxSAT formulation by integrating mini-batch learning and iterative rule-learning. In our experiments, IMLI achieves the best balance among prediction accuracy, interpretability, and scalability. As an application, we deploy IMLI in learning popular interpretable classifiers such as decision lists and decision sets.
翻译:机器学习随着医疗、法律和运输等各种安全关键领域的应用而变得无处不在。在这些领域,机器学习提供的决策权高要求研究人员必须设计可解释的模式,因为预测对于人类来说是可以理解的。在可解释的机器学习中,基于规则的分类法通过一套包含输入特性的规则,在代表决定界限方面特别有效。基于规则的分类法的解释性一般与规则的大小有关,认为较小的规则更容易解释。为了学习这样的分类法,粗力直接方法是考虑一个最优化的问题,即试图学习最接近最精确的最小的分类规则。由于它的组合性质,这种优化问题在计算上难以解决,因此,在大型数据集中,问题并不是可以伸缩的。为此,我们研究了基于规则的准确性、可解释性和可缩缩放性之间的三角关系。 本文的贡献是一个可以解释的IMLI学习框架,这是基于最高可变性(MaxSAT SAT) 的精确性(MLI),这是在最精确性上接近精确的分类法化的分类法性规则,在十年内,我们无法用最精确的分类法化的SLILLLLLLLLLLLLLLLL, 和S