Sparse methods are the standard approach to obtain interpretable models with high prediction accuracy. Alternatively, algorithmic ensemble methods can achieve higher prediction accuracy at the cost of loss of interpretability. However, the use of blackbox methods has been heavily criticized for high-stakes decisions and it has been argued that there does not have to be a trade-off between accuracy and interpretability. To combine high accuracy with interpretability, we generalize best subset selection to best split selection. Best split selection constructs a small number of sparse models learned jointly from the data which are then combined in an ensemble. Best split selection determines the models by splitting the available predictor variables among the different models when fitting the data. The proposed methodology results in an ensemble of sparse and diverse models that each provide a possible explanation for the relationship between the predictors and the response. The high computational cost of best split selection motivates the need for computational tractable approximations. We evaluate a method developed by Christidis et al. (2020) which can be seen as a multi-convex relaxation of best split selection.
翻译:粗略的方法是获取可解释且预测准确度高的模型的标准方法。 或者,算法组合方法可以以解释性损失为代价实现更高的预测准确性。然而,黑盒方法的使用因高比例决定而备受批评,而且有人认为,在准确性和可解释性之间不必权衡利弊。为了将高准确性和可解释性结合起来,我们将最佳子集选择推广到最佳分化选择。最佳分解选择方法从数据中联合学习的少量稀少模型,然后将数据合并成一个共同组成体。最佳分解选择通过将现有预测变量在适应数据时分成不同的模型来决定模型。拟议方法的结果是,各种模型的集合,每一种模型都为预测器和反应器之间的关系提供了可能的解释。最佳分解选择的计算成本高,需要计算可拉动的近比值。我们评估了Christidis等人开发的一种方法(2020年),这种方法可以被视为最佳分解选择的多式简化。