Random forests have long been considered as powerful model ensembles in machine learning. By training multiple decision trees, whose diversity is fostered through data and feature subsampling, the resulting random forest can lead to more stable and reliable predictions than a single decision tree. This however comes at the cost of decreased interpretability: while decision trees are often easily interpretable, the predictions made by random forests are much more difficult to understand, as they involve a majority vote over hundreds of decision trees. In this paper, we examine different types of reasons that explain "why" an input instance is classified as positive or negative by a Boolean random forest. Notably, as an alternative to sufficient reasons taking the form of prime implicants of the random forest, we introduce majoritary reasons which are prime implicants of a strict majority of decision trees. For these different abductive explanations, the tractability of the generation problem (finding one reason) and the minimization problem (finding one shortest reason) are investigated. Experiments conducted on various datasets reveal the existence of a trade-off between runtime complexity and sparsity. Sufficient reasons - for which the identification problem is DP-complete - are slightly larger than majoritary reasons that can be generated using a simple linear- time greedy algorithm, and significantly larger than minimal majoritary reasons that can be approached using an anytime P ARTIAL M AX SAT algorithm.
翻译:长期以来,随机森林被视作机器学习中的强大模型。 通过培训多种决策树,其多样性通过数据和特征子取样得到培养,由此产生的随机森林可以带来比单一决策树更稳定和更可靠的预测。然而,这代价是解释性降低:决策树往往容易解释,随机森林的预测则更难理解,因为它们涉及数百种决策树的多数选票。在本文中,我们研究了解释“为什么”输入实例被布利安随机森林归类为正或负的原因的不同类型。值得注意的是,作为取代以随机森林主要不灵为形式的充分理由的一种替代办法,我们提出了主要理由,这些理由是严格多数决策树的主要不灵通性。由于这些不同的绑架性解释,对产生问题的易感性(一个原因)和最小化问题(一个最短的原因)进行了调查。在各种数据集上进行的实验揭示了运行复杂性和吸食性之间是否存在交易。一个充分的理由,其中的识别问题包括以随机森林的主要不通性为形式出现的充分理由,我们提出的主要理由是,使用最简单的AX级算法,因此,使用最简单、最接近于最接近的AxIalalalalal性的原因是,使用最简单、最接近的Aralalalalalalalalalalalalalalalal。