Sparse decision tree learning provides accurate and interpretable predictive models that are ideal for high-stakes applications by finding the single most accurate tree within a (soft) size limit. Rather than relying on a single "best" tree, Rashomon sets-trees with similar performance but varying structures-can be used to enhance variable importance analysis, enrich explanations, and enable users to choose simpler trees or those that satisfy stakeholder preferences (e.g., fairness) without hard-coding such criteria into the objective function. However, because finding the optimal tree is NP-hard, enumerating the Rashomon set is inherently challenging. Therefore, we introduce SORTD, a novel framework that improves scalability and enumerates trees in the Rashomon set in order of the objective value, thus offering anytime behavior. Our experiments show that SORTD reduces runtime by up to two orders of magnitude compared with the state of the art. Moreover, SORTD can compute Rashomon sets for any separable and totally ordered objective and supports post-evaluating the set using other separable (and partially ordered) objectives. Together, these advances make exploring Rashomon sets more practical in real-world applications.
翻译:稀疏决策树学习通过寻找(软)规模限制内最精确的单一树,提供了适用于高风险应用场景的准确且可解释的预测模型。相较于依赖单一“最优”树,Rashomon集合——即具有相似性能但结构各异的树集合——可用于增强变量重要性分析、丰富解释性,并允许用户选择更简单的树或满足利益相关者偏好(如公平性)的树,而无需将这些标准硬编码至目标函数中。然而,由于寻找最优树是NP难问题,枚举Rashomon集合本质上面临挑战。为此,我们提出SORTeD这一新颖框架,该框架提升了可扩展性,并按目标值顺序枚举Rashomon集合中的树,从而实现了任意时间行为。实验表明,与现有技术相比,SORTeD将运行时间减少了高达两个数量级。此外,SORTeD能够为任何可分离且全序的目标函数计算Rashomon集合,并支持使用其他可分离(及偏序)目标函数对集合进行后评估。这些进展共同使得在实际应用中探索Rashomon集合变得更加可行。