Prediction rule ensembles (PRE) provide interpretable prediction models with relatively high accuracy.PRE obtain a large set of decision rules from a (boosted) decision tree ensemble, and achieves sparsitythrough application of Lasso-penalized regression. This article examines the use of surrogate modelsto improve performance of PRE, wherein the Lasso regression is trained with the help of a massivedataset generated by the (boosted) decision tree ensemble. This use of model-based data generationmay improve the stability and consistency of the Lasso step, thus leading to improved overallperformance. We propose two surrogacy approaches, and evaluate them on simulated and existingdatasets, in terms of sparsity and predictive accuracy. The results indicate that the use of surrogacymodels can substantially improve the sparsity of PRE, while retaining predictive accuracy, especiallythrough the use of a nested surrogacy approach.
翻译:预测规则集合(PRE) 提供了相当精确的可解释的预测模型。 PRE 从一个( 加速的) 决定树集合中获取了一大套决定规则, 并通过应用Lasso- Penalized回归法实现了宽度。 本条审查了代用模型的使用, 以改善 PRE的性能, 在( 加速的) 决策树联合体生成的大规模数据集的帮助下, 拉索回归法得到了培训。 这种基于模型的数据生成方法可以提高Lasso 步骤的稳定性和一致性, 从而导致总体性能的改善。 我们提出了两种代用法, 并在模拟和现有数据集中, 以宽度和预测性准确性来评估它们。 结果表明, 代用模型可以大大改善 PRE 的灵敏度, 同时保持预测性准确性, 特别是通过使用嵌巢式代孕法 。