Our goal in this paper is to automatically extract a set of decision rules (rule set) that best explains a classification data set. First, a large set of decision rules is extracted from a set of decision trees trained on the data set. The rule set should be concise, accurate, have a maximum coverage and minimum number of inconsistencies. This problem can be formalized as a modified version of the weighted budgeted maximum coverage problem, known to be NP-hard. To solve the combinatorial optimization problem efficiently, we introduce a nested genetic algorithm which we then use to derive explanations for ten public data sets.
翻译:本文的目标是自动地提取一套决定规则(规则集),以对分类数据集作出最佳解释。 首先,从一组关于数据集的培训决策树中抽出一大批决定规则。 规则集应该简明、 准确、 具有最大限度的覆盖面和最小的不一致性。 这个问题可以正式化为加权预算最高覆盖率问题的修订版本, 已知为NP- 硬性。 为了有效解决组合优化问题, 我们引入了一套嵌套基因算法, 用于为十套公共数据集获取解释。