Sparse decision trees are one of the most common forms of interpretable models. While recent advances have produced algorithms that fully optimize sparse decision trees for prediction, that work does not address policy design, because the algorithms cannot handle weighted data samples. Specifically, they rely on the discreteness of the loss function, which means that real-valued weights cannot be directly used. For example, none of the existing techniques produce policies that incorporate inverse propensity weighting on individual data points. We present three algorithms for efficient sparse weighted decision tree optimization. The first approach directly optimizes the weighted loss function; however, it tends to be computationally inefficient for large datasets. Our second approach, which scales more efficiently, transforms weights to integer values and uses data duplication to transform the weighted decision tree optimization problem into an unweighted (but larger) counterpart. Our third algorithm, which scales to much larger datasets, uses a randomized procedure that samples each data point with a probability proportional to its weight. We present theoretical bounds on the error of the two fast methods and show experimentally that these methods can be two orders of magnitude faster than the direct optimization of the weighted loss, without losing significant accuracy.
翻译:松散的决策树是最常见的可解释模型形式之一。 虽然最近的进步产生了完全优化稀疏的决策树以进行预测的算法, 但这项工作并没有涉及政策设计, 因为算法无法处理加权数据样本。 具体地说, 它们依赖于损失函数的离散性, 这意味着无法直接使用实际估价的重量。 例如, 任何现有技术都没有产生包含对单个数据点的反偏差权重的政策。 我们为高效的稀疏加权决策树优化提出了三种算法。 第一个方法直接优化加权损失函数; 然而, 它往往在计算上效率低下。 我们的第二个方法, 其效率更高, 将权重转换为整值, 并使用数据重复来将加权决策树优化问题转换为非加权( 但更大的)对等值。 我们的第三个算法, 其尺度比大得多的数据集, 使用随机化程序, 将每个数据点取样的概率与其重量成正比。 我们对两个快速方法的错误进行理论约束, 并实验性地显示, 这些方法的大小可以比直接优化加权损失的精确度快两等。